Joins: INNER = intersection only; LEFT = all left + matching right (NULL fill); RIGHT = mirror of LEFT; FULL OUTER = union of both. Why it matters: Join choice affects result cardinality and NULL handling—wrong join = wrong business logic (e.g., LEFT to preserve all customers...
Red Flag: Reciting definitions without explaining when to use LEFT vs INNER or RANK vs ROW_NUMBER. Pro-Move: 'We use ROW_NUMBER for dedup (deterministic) and RANK for leaderboards (ties get same rank)—business requirement drove the choice'—connects technique to use case.
This hard-level SQL question appears frequently in data engineering interviews at companies like Ford, KPMG, Nihilent. While less common, it tests deeper understanding that distinguishes strong candidates. Mastering the underlying concepts (join, partition, window) will help you answer variations of this question confidently.
This is a senior-level question that tests architectural thinking. Lead with the high-level design, then drill into specifics. Discuss trade-offs explicitly - there is rarely one correct answer. Show awareness of scale, fault tolerance, and operational complexity.
Joins: INNER = intersection only; LEFT = all left + matching right (NULL fill); RIGHT = mirror of LEFT; FULL OUTER = union of both. Why it matters: Join choice affects result cardinality and NULL handling—wrong join = wrong business logic (e.g., LEFT to preserve all customers even without orders). Window functions: ROW_NUMBER() = unique rank 1,2,3; RANK() = ties same rank, gaps after; DENSE_RANK() = ties same rank, no gaps. Why: ROW_NUMBER for dedup; RANK for 'top N per group with ties'; DENSE_RANK when rank gaps are meaningless. Scalability: JOIN order and size drive shuffle in distributed engines; put smaller table on broadcast side when possible. Cost: Window with PARTITION BY causes shuffle; ORDER BY alone can be cheaper. Example: SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) rn FROM employees e LEFT JOIN departments d ON e.dept_id = d.id;
This answer is partially locked
Unlock the full expert answer with code examples and trade-offs
Practice real interviews with AI feedback, track progress, and get interview-ready faster.
Pro starts at $19/mo - cancel anytime
Trusted by 10,000+ aspiring data engineers
According to DataEngPrep.tech, this is one of the most frequently asked SQL interview questions, reported at 3 companies. DataEngPrep.tech maintains a curated database of 1,863+ real data engineering interview questions across 7 categories, verified by industry professionals.