Medium-level sql questions from real data engineering interviews.
These medium sql questions are selected from real interviews at top companies. Each question includes a detailed expert answer and pro tip to help you nail your interview.
Write an SQL query to find the second-highest salary from an employee table.
Demonstrate the difference between DENSE_RANK() and RANK()
Discuss differences between ROW_NUMBER(), RANK(), and DENSE_RANK(), and provide examples from your projects.
Explain the differences between Data Warehouse, Data Lake, and Delta Lake
Explain the differences between Repartition and Coalesce. When would you use each?
What is the difference between partitioning and bucketing in Spark, and when would you use bucketing?
Can you explain the difference between OLTP and OLAP?
Describe a time when you had to optimize a slow SQL query. What steps did you take?
Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
How do you handle NULL values in SQL? Mention functions like COALESCE and NULLIF.
What is the difference between WHERE and HAVING clauses in SQL?
Describe a scenario where partitioning and bucketing would improve query performance.
Explain the types of triggers in ADF, including schedule, tumbling window, and event-based triggers.
How do you remove duplicate rows in BigQuery?
When would you choose a Snowflake schema over a Star schema?
Detail examples of inner, outer, left, and right joins.
Difference between ROW_NUMBER(), RANK(), and DENSE_RANK() with examples.
Difference between where and having clause with examples.
Explain SQL Window Functions with examples.
Explain the use of the MERGE statement in SQL.
How do you handle NULL values in SQL? Mention functions like COALESCE and ISNULL.
How would you handle duplicate records in an SQL table?
Implement a query to find the top 5 customers by total sales amount.
SQL query to find the second highest salary from each department.
What are primary keys and foreign keys? Why are they important?
What is a self-join, and when would you use it?
What is normalization and denormalization? When would you use each?
What is the difference between a view and a materialized view?
Write an SQL query to find duplicate emails in a users table.
Triggers in ADF, especially tumbling window triggers.
What is a window function? Explain with an example.
What is the difference between OLTP and OLAP?
Write a SQL query to find top 3 earners in each department.
Write a query to find the top three highest-paid employees in each department using window functions.
Write complex SQL queries involving multiple joins, subqueries, and data aggregation logic.
Add Row Numbers using window function in PySpark
Add a column to the Employees table that shows the name of the employee with the next higher employee_id.
Add a new column with manager names for each employee using a self-join.
Add a new column with the average salary by department.
Advanced SQL with CTEs and Conditional Joins
Analyze the output of various joins (LEFT, RIGHT, INNER, CROSS, FULL OUTER) on the given tables.
Calculate the cumulative transaction amount for each month using a transaction table.
Can you describe a project where you handled large volumes of data?
Can you modify a partitioned table into a non-partitioned one and vice-versa? How?
Check for duplicates in a table.
Coalesce function in SQL - explain
Compare Airflow's @daily vs once trigger scheduling.
Compare OLTP and OLAP systems in the context of financial transactions.
Compare PostgreSQL vs Snowflake. How do they handle duplicate record errors?
Compare the star schema and snowflake schema. Which one would you use for reporting at Swiggy, and why?
Connecting BigQuery with Linux
Count records for INNER JOIN and LEFT JOIN
Create data models for storing users, artists, and related data for music platform
Create partitioned table
Delete vs. Truncate in Snowflake?
Demonstrate how to use a LEFT JOIN to combine data from two tables and handle null values.
Describe a scenario where you disagreed with a product or business team. What did you do?
Describe a scenario where you would use a CROSS JOIN vs. an INNER JOIN.
Describe how Dataproc integrates with BigQuery for processing large datasets.
Describe how partitioning helps improve query performance in a large dataset.
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.