What is the difference between SparkSession and SparkContext in Spark?
Spark/Big Datahard
2
Architecturally, how would you justify or challenge Hadoop vs. a cloud-native data lake (S3 + EMR/Databricks) for a greenfield enterprise data platform? Discuss scalability ceilings, cost model trade-offs, and operational complexity.
Spark/Big Datahard
3
Why is SparkSession used in Spark 2.0 and later versions?
Spark/Big Datahard
4
What is the difference between a generator and a list in Python?
Python/Codinghard
5
Explain the architectural rationale for using LeftAntiJoin vs. NOT IN vs. NOT EXISTS in a distributed context. When does LeftAntiJoin become a performance or scalability bottleneck, and how do broadcast vs. shuffle joins affect cost?
SQLhard
6
How would you move a file to another path in Databricks File System (DBFS)?
Spark/Big Datahard
7
How would you read data from an RDBMS using Spark? Provide the syntax.
Spark/Big Datahard
8
Have you worked with Oozie? If yes, can you explain what it is and how it's used in data pipelines?
System Design/Architecturehard
+8 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.