Explain wide vs. narrow transformations and how they drive shuffle cost, failure domains, and pipeline design. When would you intentionally add a wide transformation, and how do you minimize its impact?
Spark/Big Datahard
2
Architecturally, how do Job–Stage–Task boundaries in Spark's execution model impact cluster sizing, shuffle cost, and when would you deliberately collapse or split stages?
Spark/Big Datahard
3
What are the key components of the Spark execution model (Job, Stage, Task)?
Spark/Big Datahard
4
Create Spark Session, read CSV, join, and write as table. Provide example code.
SQLhard
5
How do you give permission to a notebook to other users in Databricks?
Spark/Big Datahard
6
How does Autoscaling work in Databricks and what are its benefits?
Spark/Big Datahard
7
Provide example code for Drop Duplicates in PySpark.
Spark/Big Datahard
+7 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.