SQL questions from American Express data engineering interviews.
These sql questions are sourced from American Express data engineering interviews. Each includes an expert-level answer. This set leans toward the medium-difficulty band most real interviews actually live in (3 of 8). Recurring themes are join, window, and spark — these patterns appear most often in real interviews and reward the deepest preparation. Average answer is around 1 minute of reading — plan roughly 1 hour to work through the full set thoughtfully.
This collection contains 8 curated questions: 3 easy, 3 medium, and 2 hard. There's a strong foundation of fundamentals-focused questions — ideal for building confidence before tackling advanced topics.
The most frequently tested areas in this set are join (3), window (2), spark (2), partition (2), optimization (1), and sql (1). Focusing on these topics will give you the highest return on your preparation time.
Start with the easy questions to warm up and solidify fundamentals. Medium-difficulty questions form the bulk of real interviews — spend the most time here and practice explaining your reasoning out loud. Hard questions often appear in senior and staff-level rounds; attempt them after you're comfortable with the basics. For each question, try answering before revealing the solution. Use our AI Mock Interview to simulate real interview conditions and get instant feedback on your responses.
Describe a scenario where you used Databricks for real-time data processing.
Describe a cross-team data project where you had to align architectural boundaries, ownership, and SLAs. How did you handle conflicting priorities, technical debt, and the scalability of communication as the number of stakeholders grew?
Implement a recursive query for hierarchy (employee-manager). Explain the termination guarantees, depth limits, and when a recursive CTE becomes a scalability bottleneck. What alternatives exist for graph-scale hierarchies in Spark or a data lake?
Explain bloom filters in Spark: how they reduce I/O and when they introduce false positives that hurt performance. What are the scalability and cost implications of enabling dynamic partition pruning and bloom filter pushdown at petabyte scale?
Given a table of sales data, use window functions to calculate a running total.
How do you handle schema evolution in data lakes or data warehouses?
How would you optimize a query with multiple joins and subqueries?
Write a query to find the first number repeating consecutively three times in a sequence.
Get full access to 1,800+ expert answers, AI mock interviews, and personalized progress tracking.