Spark & Big Data questions from American Express data engineering interviews.
These spark & big data questions are sourced from American Express data engineering interviews. Each includes an expert-level answer.
What is the difference between SparkSession and SparkContext in Spark?
Code a simple PySpark job to read a JSON file, filter records, and write output in Parquet format.
Explain a scenario-based question on Spark optimization and how you would troubleshoot performance issues.
Explain repartition vs. coalesce. Which one would you use to reduce shuffle operations?
How did you handle data ingestion and processing for large datasets?
How does Spark's Catalyst Optimizer improve query performance?
What is the salting technique, and when would you use it?
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.