SQL questions from Dunnhumby data engineering interviews.
These sql questions are sourced from Dunnhumby data engineering interviews. Each includes an expert-level answer. This set leans toward the medium-difficulty band most real interviews actually live in (9 of 14). Recurring themes are partition, sql, and join — these patterns appear most often in real interviews and reward the deepest preparation. Many of these questions also surface at Aarete and Incedo, so the preparation transfers across companies. Average answer is around 1 minute of reading — plan roughly 1 hour to work through the full set thoughtfully.
This collection contains 14 curated questions: 2 easy, 9 medium, and 3 hard. The distribution skews toward harder problems, reflecting the depth expected in senior-level interviews.
The most frequently tested areas in this set are partition (10), sql (7), join (5), bigquery (2), optimization (2), and window (1). Focusing on these topics will give you the highest return on your preparation time.
Start with the easy questions to warm up and solidify fundamentals. Medium-difficulty questions form the bulk of real interviews — spend the most time here and practice explaining your reasoning out loud. Hard questions often appear in senior and staff-level rounds; attempt them after you're comfortable with the basics. For each question, try answering before revealing the solution. Use our AI Mock Interview to simulate real interview conditions and get instant feedback on your responses.
Explain Common Table Expressions (CTEs) and their benefits.
Explain SQL Window Functions with examples.
Explain the use of the MERGE statement in SQL.
How do you handle NULL values in SQL? Mention functions like COALESCE and ISNULL.
How do you optimize a long-running SQL query?
How would you handle duplicate records in an SQL table?
Explain how you would use repartition or coalesce effectively to optimize processing when analyzing data only for a specific region.
How can you delete partitions from a table in Hive using a command?
If manual partitions are created in a Hive data-warehouse table directory, and you query records from those partitions, will you see the data? If not, how can this be fixed?
What is the difference between static and dynamic partitioning in Hive?
Write a SQL query to find distinct IDs from a table where the count is more than 1 and greater than 200.
You need to create a workflow where Task B runs only if Task A is successful, and Task C should always run regardless of Task A or B's status. How would you define this dependency using Airflow?
You need to design a Kafka topic for a logging service. How would you decide the number of partitions and the key for partitioning to balance throughput and ordering requirements?
Your Kafka consumer shows significant lag during peak hours. What strategies would you employ to reduce lag and ensure timely data processing?
Get full access to 1,800+ expert answers, AI mock interviews, and personalized progress tracking.