SQL Window Functions & CTEs: The Complete Interview Guide for Data Engineers (2026)
Window functions and CTEs are the #1 tested SQL topics at Amazon, Google, and Databricks. This guide covers every pattern you'll face with production-ready answers.
Key Takeaways
- βWhy Window Functions & CTEs Dominate Data Engineering Interviews
- βROW_NUMBER vs RANK vs DENSE_RANK β The Most Asked Question
- βLAG and LEAD β Time-Series Pattern Questions
- βRunning Totals and Moving Averages with ROWS/RANGE
Why Window Functions & CTEs Dominate Data Engineering Interviews
If you're preparing for a data engineering interview at any top tech company in 2026, there are two SQL topics you absolutely cannot skip: window functions and Common Table Expressions (CTEs).
At Amazon, roughly 40% of SQL rounds involve at least one window function problem. Google's data engineering interviews test CTEs in nearly every coding round. Databricks, Snowflake, and Meta follow the same pattern.
The reason is simple: window functions and CTEs are how production data pipelines actually work. They're the bridge between "I can write SELECT statements" and "I can build the analytical layer for a data warehouse." Interviewers use them to separate candidates who write SQL from candidates who *think* in SQL.
This guide gives you the exact patterns, ranked by interview frequency, with production-grade answers you can use verbatim.
ROW_NUMBER vs RANK vs DENSE_RANK β The Most Asked Question
Question: Explain the differences between ROW_NUMBER(), RANK(), and DENSE_RANK(). When would you use each?
This is the single most frequently asked SQL window function question across all companies.
Answer:
All three are ranking functions, but they handle ties differently:
- ROW_NUMBER() β assigns a unique sequential integer to each row. No ties, ever. Even if two rows have identical values, they get different numbers. Use case: deduplication (
WHERE rn = 1), pagination.
- RANK() β assigns the same rank to ties, then *skips* numbers. If two rows tie for rank 1, the next row gets rank 3 (not 2). Use case: competitive ranking where gaps matter ("3rd place out of 5").
- DENSE_RANK() β assigns the same rank to ties but does *not* skip numbers. Ties at rank 1 mean the next row gets rank 2. Use case: "find the Nth highest salary" problems, top-N per group.
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as drnk
FROM employees;Pro tip for interviews: Always mention the deduplication pattern β ROW_NUMBER() OVER (PARTITION BY key ORDER BY updated_at DESC) WHERE rn = 1. This is used in almost every production ETL pipeline and shows you think beyond textbook answers.
LAG and LEAD β Time-Series Pattern Questions
Question: Write a query to find the day-over-day change in revenue for each product.
Answer:
LAG and LEAD access rows at a fixed offset relative to the current row without a self-join.
SELECT
product_id,
sale_date,
revenue,
LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_day_revenue,
revenue - LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date) AS daily_change,
ROUND(
(revenue - LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date)) * 100.0 /
NULLIF(LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date), 0), 2
) AS pct_change
FROM daily_sales
ORDER BY product_id, sale_date;Follow-up interviewers love: "What if there are gaps in dates?" β You'd generate a date spine using generate_series or a calendar table, then LEFT JOIN your data to fill missing dates with 0 or NULL.
Question: Find users whose login count dropped by more than 50% compared to the previous month.
WITH monthly AS (
SELECT user_id, DATE_TRUNC('month', login_date) AS month, COUNT(*) AS logins
FROM user_logins
GROUP BY 1, 2
)
SELECT *
FROM (
SELECT *, LAG(logins) OVER (PARTITION BY user_id ORDER BY month) AS prev_logins
FROM monthly
) t
WHERE logins < prev_logins * 0.5;Running Totals and Moving Averages with ROWS/RANGE
Question: Calculate a 7-day moving average of daily orders.
Answer:
SELECT
order_date,
daily_orders,
AVG(daily_orders) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM (
SELECT order_date, COUNT(*) AS daily_orders
FROM orders
GROUP BY order_date
) daily;Key distinction interviewers test: ROWS BETWEEN counts physical rows. RANGE BETWEEN counts logical values. If your dates have gaps, ROWS BETWEEN 6 PRECEDING grabs 6 rows (which might span more than 7 days). For strict calendar windows, generate a date spine first.
Question: Calculate running total of sales per region, resetting each quarter.
SELECT
region,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY region, DATE_TRUNC('quarter', sale_date)
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS quarterly_running_total
FROM sales;CTEs: Recursive and Non-Recursive Patterns
Question: What is a CTE and when would you choose it over a subquery?
Answer:
A CTE (Common Table Expression) is a named temporary result set defined with WITH. Benefits over subqueries:
- Readability β Named blocks that read top-to-bottom instead of inside-out
- Reusability β Reference the same CTE multiple times without recomputing (in most engines)
- Recursion β Only CTEs support recursive queries in standard SQL
Question: Write a recursive CTE to find all subordinates of a given manager.
WITH RECURSIVE org_tree AS (
-- Base case: direct reports
SELECT employee_id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id = 100
UNION ALL
-- Recursive case: reports of reports
SELECT e.employee_id, e.name, e.manager_id, t.level + 1
FROM employees e
JOIN org_tree t ON e.manager_id = t.employee_id
WHERE t.level < 10 -- safety limit
)
SELECT * FROM org_tree ORDER BY level, name;Production pattern interviewers love: CTE + window function combo for deduplication:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time DESC) AS rn
FROM raw_events
)
SELECT * FROM ranked WHERE rn = 1;This pattern appears in virtually every data pipeline that ingests CDC (Change Data Capture) events.
NTILE, PERCENT_RANK, and CUME_DIST β Distribution Functions
Question: Divide customers into quartiles based on their total spend.
SELECT
customer_id,
total_spend,
NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM (
SELECT customer_id, SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id
) customer_spend;Question: Find what percentile each employee's salary falls in within their department.
SELECT
employee_id,
department,
salary,
ROUND(PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) * 100, 1) AS percentile
FROM employees;Interview insight: PERCENT_RANK returns 0 for the first row, while CUME_DIST never returns 0. If an interviewer asks "find all employees above the 90th percentile," PERCENT_RANK is the safer choice because CUME_DIST can include ties at the boundary.
FIRST_VALUE, LAST_VALUE, NTH_VALUE
Question: For each department, show every employee alongside the highest-paid and lowest-paid colleague.
SELECT
department,
employee_name,
salary,
FIRST_VALUE(employee_name) OVER w AS highest_paid,
LAST_VALUE(employee_name) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_paid
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);Gotcha interviewers test: LAST_VALUE with the default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) only looks at rows up to the current row β it doesn't actually give you the last value in the partition. You must explicitly set ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This is a common trap.
The Second-Highest Salary Problem β 5 Different Ways
This is the most classic SQL interview question. Interviewers want to see multiple approaches:
Approach 1: DENSE_RANK
SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM employees
) t WHERE dr = 2;Approach 2: OFFSET/LIMIT
SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1;Approach 3: Subquery
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);Approach 4: CTE
WITH ranked AS (
SELECT DISTINCT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rn
FROM employees
)
SELECT salary FROM ranked WHERE rn = 2;Pro tip: Always ask the interviewer: "Should I handle NULL salaries? What if there's only one distinct salary?" This shows production awareness. Use COALESCE or return NULL explicitly in edge cases.
Top N Per Group β The Pattern Every Pipeline Uses
Question: Find the top 3 highest-selling products in each category.
WITH ranked_products AS (
SELECT
category,
product_name,
total_sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_sales DESC) AS rn
FROM product_sales
)
SELECT category, product_name, total_sales
FROM ranked_products
WHERE rn <= 3;Follow-up: "What if you want ties included?" β Use DENSE_RANK instead of ROW_NUMBER.
Follow-up: "This table has 10 billion rows. How do you optimize?" β Filter early with a WHERE clause before the window function, partition-prune if the table is partitioned by category, and consider a lateral join approach for very wide partitions.
Gap and Island Detection
Question: Find consecutive login streaks for each user.
This is the classic "islands" problem β one of the hardest window function patterns in interviews.
WITH numbered AS (
SELECT
user_id,
login_date,
login_date - INTERVAL '1 day' * ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
) AS grp
FROM (
SELECT DISTINCT user_id, login_date::date AS login_date
FROM user_logins
) deduped
)
SELECT
user_id,
MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end,
COUNT(*) AS streak_length
FROM numbered
GROUP BY user_id, grp
HAVING COUNT(*) >= 3
ORDER BY streak_length DESC;How it works: By subtracting the ROW_NUMBER (as days) from each date, consecutive dates produce the same group value. Non-consecutive dates break the group. This is an elegant pattern that impresses interviewers every time.
Interview Preparation Checklist
Before your next interview, make sure you can write each of these from memory:
- Deduplication β ROW_NUMBER + PARTITION BY + WHERE rn = 1
- Top N per group β ROW_NUMBER or DENSE_RANK + PARTITION BY
- Running total β SUM OVER with ROWS frame
- Moving average β AVG OVER with ROWS BETWEEN N PRECEDING
- Day-over-day change β LAG with percent calculation
- Percentile buckets β NTILE(4) or NTILE(10)
- Recursive hierarchy β WITH RECURSIVE + UNION ALL
- Gap and island β date minus ROW_NUMBER trick
- Sessionization β LAG to detect gaps, then running SUM to assign session IDs
- Pivot with window β conditional aggregation + CASE WHEN inside windows
Practice each pattern until you can write it in under 5 minutes. That's the bar at FAANG companies.
Ready to practice? Try the AI Mock Interview to get instant feedback on your SQL answers, or browse our 1,800+ interview questions with expert-verified solutions.
Written by the DataEngPrep Team
Our editorial team consists of experienced data engineers who have worked at top tech companies and gone through hundreds of real interviews. Every article is reviewed for technical accuracy and practical relevance to help you prepare effectively.
Learn more about our team βRelated Articles
Practice These Questions
Ace Your Interview with AI Coaching
1,800+ expert answers, AI mock interviews, and personalized feedback to get you hired.