What is the difference between repartition and coalesce in Apache Spark?
Spark/Big Datamedium
2
What is the difference between cache() and persist() in Spark? When would you use each?
Spark/Big Datamedium
3
What is the difference between groupByKey and reduceByKey in Spark?
Spark/Big Datamedium
4
What is the difference between narrow and wide transformations in Apache Spark? Explain with examples.
Spark/Big Datamedium
5
What strategies can you use to handle skewed data in Spark?
Spark/Big Datamedium
6
Explain the difference between Spark's map() and flatMap() transformations.
Spark/Big Datamedium
7
Explain the concept of Broadcast Join in Spark. When should it be used?
Spark/Big Datamedium
8
Convert complex SQL (CTEs, window functions, subqueries) to production-grade PySpark. Discuss when to use spark.sql() vs. DataFrame API, and the implications for testability, partitioning, and execution predictability.
Spark/Big Datamedium
+20 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.