Spark & Big Data questions from Incedo data engineering interviews.
These spark & big data questions are sourced from Incedo data engineering interviews. Each includes an expert-level answer.
What is the difference between SparkSession and SparkContext in Spark?
How do you handle late-arriving data in Spark Structured Streaming?
What is the small-file problem in Spark, and how do you solve it?
Design a Delta table layout for mixed workload: point lookups by user_id, range scans by date, and full partition scans. Compare partitioning vs. Z-ordering—when to use each, and the rewrite cost trade-off.
Architect incremental load in ADF + Databricks with idempotency, late-arrival handling, and cost/scalability implications of watermark vs. change data capture.
What is the small-file problem in Spark, and how do you solve it?
What is the difference between Managed and External Tables in Databricks?
Explain PySpark's Catalyst Optimizer.
Explain caching techniques in Databricks.
What is the difference between Lazy Evaluation and Eager Execution in PySpark?
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.