Spark & Big Data questions from American Express data engineering interviews.
These spark & big data questions are sourced from American Express data engineering interviews. Each includes an expert-level answer. This set leans toward senior-level depth (6 of 7 are tagged hard). Recurring themes are partition, spark, and optimization — these patterns appear most often in real interviews and reward the deepest preparation. Many of these questions also surface at Altimetrik and Citi, so the preparation transfers across companies. Average answer is around 1 minute of reading — plan roughly 1 hour to work through the full set thoughtfully.
This collection contains 7 curated questions: 0 easy, 1 medium, and 6 hard. The distribution skews toward harder problems, reflecting the depth expected in senior-level interviews.
The most frequently tested areas in this set are partition (6), spark (5), optimization (4), sql (3), join (3), and python (2). Focusing on these topics will give you the highest return on your preparation time.
Medium-difficulty questions form the bulk of real interviews — spend the most time here and practice explaining your reasoning out loud. Hard questions often appear in senior and staff-level rounds; attempt them after you're comfortable with the basics. For each question, try answering before revealing the solution. Use our AI Mock Interview to simulate real interview conditions and get instant feedback on your responses.
What is the difference between SparkSession and SparkContext in Spark?
Code a simple PySpark job to read a JSON file, filter records, and write output in Parquet format.
Explain a scenario-based question on Spark optimization and how you would troubleshoot performance issues.
Explain repartition vs. coalesce. Which one would you use to reduce shuffle operations?
How did you handle data ingestion and processing for large datasets?
How does Spark's Catalyst Optimizer improve query performance?
What is the salting technique, and when would you use it?
Get full access to 1,800+ expert answers, AI mock interviews, and personalized progress tracking.