Data engineering interview questions · medium
What metrics would trigger an auto-scaling event?
What metrics would you analyze to determine if your partitioning strategy is effective?
What motivates you to join Morgan Stanley?
What strategies would you use to manage dynamic partitions efficiently?
What types of columns support PARTITION_BY in BigQuery?
When would you choose partitioning over bucketing, or vice versa?
Where is the PARTITION_BY option in the BigQuery UI?
Why is HAVING clause used only after GROUP BY?
Why not use ROW_NUMBER() instead? Discuss pros and cons.
Why star schema? Compared with snowflake schema and normalized approaches.
Window Functions: ROW_NUMBER(), RANK(), PARTITION BY - produce output using dataset
Write SQL to identify employees whose salary is higher than their manager's.
Write a PySpark job to find the top 3 employees of each department, where Age < 30 and Salary > department average salary.
Write a SQL query leveraging window functions and timestamps to identify updates over time
Write a SQL query to calculate the highest salary in each department using a window function
Write a SQL query to calculate the running total of sales per region, partitioned by year.
Write a SQL query to detect customers who have not placed a second order in 90 days.
Write a SQL query to find departments with more than 10 employees.
Write a SQL query to find employees earning the second-highest salary.
Write a SQL query to find house with Avg(score) > 70.
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
SQL is the most tested topic in data engineering interviews. Most companies dedicate an entire round to SQL, typically asking 3-5 questions covering window functions, CTEs, joins, optimization, and platform-specific features.
Focus on: window functions (RANK, ROW_NUMBER, LAG/LEAD), CTEs and recursive queries, query optimization and execution plans, indexing strategies, and platform-specific features for BigQuery, Redshift, or Snowflake depending on the company.
Yes. Data engineering SQL rounds emphasize analytical queries (window functions, aggregations), large-scale optimization (partitioning, indexing), and data warehouse concepts (star schema, slowly changing dimensions). Software engineering SQL tends to focus on CRUD operations and basic joins.
For a mid-level data engineering role, plan 2-4 weeks of focused SQL practice. Cover window functions, CTEs, optimization, and practice writing queries under time pressure. Use real interview questions from companies you're targeting.