Interview Questions

Real questions from top companies · hard

700+ Easy450+ Medium650+ Hard

All Categories Behavioral Spark/Big Data SQL Python/Coding System Design/Architecture Cloud/Tools General/Othereasy medium hard

501

Spark Optimizations: skewed joins, broadcast joins, Catalyst Optimizer, repartition vs coalesce

Spark/Big Datahardjoinoptimizationpartition0.3 min read

Walmart

→

502

Spark Session Command - how to create

Spark/Big Datahardoptimizationpartitionspark0.3 min read

LTIMindtree

→

503

Spark Streaming - streaming data handling and file mounting techniques

Spark/Big Datahardoptimizationpartitionspark0.3 min read

Zen Data Shastra

→

504

Spark Submit - command syntax

Spark/Big Datahardoptimizationpartitionspark0.3 min read

LTIMindtree

→

505

Spark Tungsten & Catalyst Optimizer

Spark/Big Datahardjoinoptimizationpartition0.8 min read

Walmart

→

506

Steps to link a Databricks notebook to an ADF pipeline

Spark/Big Datahardspark0.6 min read

Kaseya

→

507

Trade-offs between batch processing (Spark) vs. real-time streams (Kafka)

Spark/Big Datahardpartitionspark0.7 min read

PayPal

→

508

Usage of UDFs?

Spark/Big Datahardoptimizationpythonsql0.6 min read

Citi

→

509

Walk through how you would debug the data ingestion process to identify slow stages.

Spark/Big Datahardpartitionspark0.6 min read

Swiggy

→

510

Walkthrough Spark's architecture, focusing on driver, executors, and DAGs

Spark/Big Datahardoptimizationpartitionspark2.5 min read

KPMG

→

511

What are Spark optimizations, and can you explain them?

Spark/Big Datahardjoinoptimizationpartition0.6 min read

Cognizant

→

512

What are the challenges of implementing real-time analytics using Spark Streaming?

Spark/Big Datahardpartitionsparkwindow0.5 min read

Goldman Sachs

→

513

What are the key properties of Delta Lake that differentiate it from traditional data lakes?

Spark/Big Datahard0.5 min read

Puma

→

514

What happens if the checkpoint location is accidentally deleted?

Spark/Big Datahard0.4 min read

TCS

→

515

What insights can you gather from the DAG visualization in Spark UI?

Spark/Big Datahardoptimizationspark0.4 min read

PWC

→

516

What is Predicate Pushdown and AQE with Example

Spark/Big Datahardjoinoptimizationpartition0.6 min read

Nagarro

→

517

What is a serializer in Spark?

Spark/Big Datahardoptimizationspark0.3 min read

Globant

→

518

What is data shuffling in Spark, and how do you minimize its impact on job performance?

Spark/Big Datahardjoinoptimizationpartition0.4 min read

Coforge

→

519

What is the difference between Lazy Evaluation and Eager Execution in PySpark?

Spark/Big Datahardjoinoptimizationspark0.4 min read

Incedo

→

520

What is the difference between MapReduce and Spark?

Spark/Big Datahardspark0.5 min read

Globant

→

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

Previous 1...24 25 26 27 28...34 Next