DataEngPrep.tech
QuestionsBlogStore
Get PDF Bundle

Interview Questions

Real questions from top companies

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1

Describe the cluster configuration used in your project, including memory allocation, number of nodes, and executor/driver settings.

Spark/Big Dataeasy
2

Describe the projects emphasizing Spark, Hadoop, or Azure for large-scale data processing

Spark/Big Datahard
3

Describe the role of a DAG Scheduler in PySpark

Spark/Big Datahard
4

Describe the role of a workflow orchestrator like Airflow in a data pipeline.

Spark/Big Dataeasy
5

Describe the stages of a Spark job and strategies to optimize Spark performance for large datasets.

Spark/Big Datahard
6

Describe your approach to managing offsets in Kafka.

Spark/Big Dataeasy
7

Design an ETL pipeline using Kafka and Spark Streaming

Spark/Big Datahard
8

Difference between Presto vs. Spark underlying architecture

Spark/Big Datahard

+20 More Questions with Expert Answers

Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.

Get PDF Bundle - from $21Try Free Sample
Previous1...6869707172...94Next