DataEngPrep.tech
QuestionsBlogStore
Get PDF Bundle

Interview Questions

Real questions from top companies · hard

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1

Explain the difference between TriggerDagRunOperator and ExternalTaskSensor in Airflow.

Spark/Big Datahard
2

Explain the difference between coalescing and repartitioning in Spark

Spark/Big Datahard
3

Explain the differences between Spark's shuffle and broadcast join. When would you use each?

Spark/Big Datahard
4

Explain the impact of Vacuum and Analyze operations on performance.

Spark/Big Datahard
5

Explain the role of DAGs (Directed Acyclic Graphs) in Spark.

Spark/Big Datahard
6

Explain your choice of streaming framework (Kafka, Spark Streaming, etc.).

Spark/Big Datahard
7

Fault Tolerance in Spark vs. Hadoop?

Spark/Big Datahard
8

Given a DataFrame with columns id and name, add a new column department: If id < 100 assign HR, if id >= 100 and id < 200 assign admin.

Spark/Big Datahard

+20 More Questions with Expert Answers

Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.

Get PDF Bundle — from $21Try Free Sample
Previous1...1718192021...34Next