Interview Questions

Real questions from top companies in Spark/Big Data

700+ Easy450+ Medium650+ Hard

All Categories Behavioral Spark/Big Data SQL Python/Coding System Design/Architecture Cloud/Tools General/Othereasy medium hard

361

What are the different delivery semantics in Kafka (at least-once, at-most-once, exactly-once)?

Spark/Big Dataeasy0.5 min read

Fragma Data Systems

→

362

What are the different modes in which you can submit Spark jobs? Explain each.

Spark/Big Dataeasyspark0.5 min read

Dunnhumby

→

363

What are the key differences between Map and Reduce in Spark?

Spark/Big Datamediumpartitionspark0.4 min read

Nielsen

→

364

What are the key performance tuning techniques you apply in Spark jobs to improve performance?

Spark/Big Datamediumjoinpartitionspark0.4 min read

Coforge

→

365

What are the key properties of Delta Lake that differentiate it from traditional data lakes?

Spark/Big Datahard0.5 min read

Puma

→

366

What are the limitations of the REORG command with respect to large datasets?

Spark/Big Datamediumpartition0.5 min read

PWC

→

367

What are the performance considerations when using Auto Loader?

Spark/Big Dataeasy0.5 min read

TCS

→

368

What are the performance trade-offs of using salting to mitigate data skewness?

Spark/Big Datamediumjoinpartition0.5 min read

PWC

→

369

What are the steps to connect to Salesforce?

Spark/Big Dataeasyspark0.4 min read

Hexaware

→

370

What are the steps to debug a failed workflow in Databricks?

Spark/Big Dataeasy0.4 min read

TCS

→

371

What are the steps to efficiently process 1 TB of data in Spark?

Spark/Big Datamediumpartitionsparksql0.5 min read

HashedIn

→

372

What are the steps to execute a Python file with PySpark code on an EC2 environment?

Spark/Big Dataeasypythonspark0.4 min read

Carelon

→

373

What are the trade-offs between using Glue Catalog vs. Hive Metastore for metadata management?

Spark/Big Dataeasysql0.4 min read

Capco

→

374

What are transient clusters in EMR, and when would you use them?

Spark/Big Dataeasyetl0.5 min read

Persistent Systems

→

375

What causes Out of Memory (OOM) issues in Databricks, and how do you resolve them?

Spark/Big Datamediumpartitionspark0.5 min read

PWC

→

376

What causes data skewness in Spark, and how can it be resolved?

Spark/Big Datamediumjoinpartitionspark0.5 min read

PWC

→

377

What configuration parameters are critical for enabling AQE effectively?

Spark/Big Datamediumjoinpartitionspark0.4 min read

PWC

→

378

What configurations are needed to pass parameters to a Databricks notebook?

Spark/Big Dataeasy0.3 min read

Virtusa

→

379

What determines the maximum parallelism achievable in Databricks?

Spark/Big Datamediumpartitionsparksql0.4 min read

TCS

→

380

What do you understand by data shuffling in Spark? Why is it important?

Spark/Big Datamediumjoinpartitionspark0.5 min read

Freecharge

→

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

Previous 1...17 18 19 20 21...23 Next