SQL questions from Goldman Sachs data engineering interviews.
These sql questions are sourced from Goldman Sachs data engineering interviews. Each includes an expert-level answer. This set leans toward the medium-difficulty band most real interviews actually live in (11 of 16). Recurring themes are join, partition, and window — these patterns appear most often in real interviews and reward the deepest preparation. Many of these questions also surface at Daniel Wellington and Swiggy, so the preparation transfers across companies. Average answer is around 1 minute of reading — plan roughly 1 hour to work through the full set thoughtfully.
This collection contains 16 curated questions: 3 easy, 11 medium, and 2 hard. There's a strong foundation of fundamentals-focused questions — ideal for building confidence before tackling advanced topics.
The most frequently tested areas in this set are join (8), partition (8), window (4), sql (4), bigquery (4), and snowflake (3). Focusing on these topics will give you the highest return on your preparation time.
Start with the easy questions to warm up and solidify fundamentals. Medium-difficulty questions form the bulk of real interviews — spend the most time here and practice explaining your reasoning out loud. Hard questions often appear in senior and staff-level rounds; attempt them after you're comfortable with the basics. For each question, try answering before revealing the solution. Use our AI Mock Interview to simulate real interview conditions and get instant feedback on your responses.
Describe a scenario where partitioning and bucketing would improve query performance.
When would you choose a Snowflake schema over a Star schema?
Implement a query to find the top 5 customers by total sales amount.
Write an SQL query to find duplicate emails in a users table.
Compare OLTP and OLAP systems in the context of financial transactions.
Describe a challenging project where you optimized a complex ETL process.
Describe a scenario where you would use a CROSS JOIN vs. an INNER JOIN.
Explain indexing and its impact on database performance.
Explain your approach to optimizing a slow-running query on a table with billions of rows.
Given a complex nested query, how would you refactor it for better readability and efficiency?
How would you decide between using a CTE and a temporary table for a complex query?
Identify and remove duplicate records from a table, keeping the most recent record based on a timestamp column.
Share an example where you had to communicate technical concepts to a non-technical audience.
Simulate a producer-consumer model using multithreading.
What are the trade-offs between relational databases and NoSQL for financial data?
Write a query to find the median salary of employees in a table.
Get full access to 1,800+ expert answers, AI mock interviews, and personalized progress tracking.