What is the difference between Managed and External tables in Hive/Spark?
Spark/Big Dataeasy
2
When would you architecturally choose Dataset[T] over DataFrame in a Scala Spark pipeline, and what are the scalability and portability trade-offs? Include type-safety benefits vs. operational constraints.
Spark/Big Dataeasy
3
What is the difference between Managed and External Tables in Databricks?
Spark/Big Dataeasy
4
A JSON file with evolving schema needs to be ingested into a DataFrame. How would you handle new fields dynamically in PySpark without breaking the job for previous structures?
Spark/Big Dataeasy
5
A task intermittently fails due to external API limitations. How would you configure Airflow retries and alerts to manage this situation efficiently?
Spark/Big Dataeasy
6
Accumulator and Broadcast Variables - explain
Spark/Big Dataeasy
7
Approaches to handling multiple tasks within a sprint?
Spark/Big Dataeasy
8
Cache() vs Persist(): Explain the difference and use cases for caching and persisting data in Spark with memory levels.
Spark/Big Dataeasy
+20 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.