Data engineering interview questions
Write an SQL query to find the second-highest salary from an employee table.
Demonstrate the difference between DENSE_RANK() and RANK()
Discuss differences between ROW_NUMBER(), RANK(), and DENSE_RANK(), and provide examples from your projects.
Explain the differences between Data Warehouse, Data Lake, and Delta Lake
Explain the differences between Repartition and Coalesce. When would you use each?
Explain the differences between a Data Lake and a Data Warehouse.
What is the difference between partitioning and bucketing in Spark, and when would you use bucketing?
Can you explain the difference between OLTP and OLAP?
Describe a time when you had to optimize a slow SQL query. What steps did you take?
Explain the concept of ACID properties in the context of databases.
Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
How do you handle NULL values in SQL? Mention functions like COALESCE and NULLIF.
What is a Common Table Expression (CTE), and when would you use it?
What is the difference between a primary key and a unique key?
What is the difference between WHERE and HAVING clauses in SQL?
Describe a scenario where partitioning and bucketing would improve query performance.
Explain Fact and Dimension Tables with examples.
Explain the types of triggers in ADF, including schedule, tumbling window, and event-based triggers.
How do you remove duplicate rows in BigQuery?
Joins and window functions - INNER, LEFT, RIGHT, FULL OUTER, ROW_NUMBER(), RANK(), DENSE_RANK()
SQL is the most tested topic in data engineering interviews. Most companies dedicate an entire round to SQL, typically asking 3-5 questions covering window functions, CTEs, joins, optimization, and platform-specific features.
Focus on: window functions (RANK, ROW_NUMBER, LAG/LEAD), CTEs and recursive queries, query optimization and execution plans, indexing strategies, and platform-specific features for BigQuery, Redshift, or Snowflake depending on the company.
Yes. Data engineering SQL rounds emphasize analytical queries (window functions, aggregations), large-scale optimization (partitioning, indexing), and data warehouse concepts (star schema, slowly changing dimensions). Software engineering SQL tends to focus on CRUD operations and basic joins.
For a mid-level data engineering role, plan 2-4 weeks of focused SQL practice. Cover window functions, CTEs, optimization, and practice writing queries under time pressure. Use real interview questions from companies you're targeting.