Question 1

Discuss differences between ROW_NUMBER(), RANK(), and DENSE_RANK(), and provide examples from your projects.

Accepted Answer

**ROW_NUMBER()**: Unique sequential numbers (1, 2, 3...); no ties—deterministic only with ORDER BY uniqueness. **RANK()**: Same rank for ties; skips (1, 2, 2, 4). **DENSE_RANK()**: Same rank for ties; no gaps (1, 2, 2, 3). **Project examples**: ROW_NUMBER() to deduplicate events by (user_id, event_time) keeping first—critical when upstream sends duplicates. DENSE_RANK() for 'top 10 products per category' reports—avoids gaps when filtering....

Question 2

Describe a time when you had to optimize a slow SQL query. What steps did you take?

Accepted Answer

**Situation**: A critical exec report was timing out at 30+ minutes; SLA was 5 minutes. **Task**: Diagnose and fix without changing business logic. **Action**: I ran EXPLAIN (ANALYZE) and found: (1) missing index on join key causing full table scan, (2) cross join that could be inner join with a filter, (3) filter in HAVING that could move to WHERE to reduce rows early, (4) unnecessary ORDER BY on a subquery....

Question 3

Why are you leaving your current company?

Accepted Answer

I've grown significantly at [Current Company]—I led [concrete achievement, e.g., migration of X TB, reduction of pipeline cost by Y%]. I'm proud of what we built. However, I'm at a point where I want to deepen my impact: I'm looking for a role where I can own architecture for systems at a larger scale, work with [specific tech: e.g., real-time streaming at petabyte scale], and mentor other engineers. [Target Company]'s work in [specific area—cite a blog, product, or news] aligns with that....

Question 4

Have you worked on Data Warehousing projects?

Accepted Answer

**Architectural context**: A data warehouse is the semantic layer between raw data and business decisions. Design choices—star vs snowflake, SCD strategy, partitioning—directly impact query latency, storage cost, and maintenance burden. **Key responsibilities**: (1) **Schema design**: Star for BI simplicity, snowflake for normalized flexibility. SCD Type 2 for slowly changing dimensions (audit trail, point-in-time correctness)....

Question 5

What is the difference between OLTP and OLAP?

Accepted Answer

**Why the distinction exists**: They serve different access patterns. OLTP = many small, random writes and point reads. OLAP = few, large sequential scans and aggregations. Optimizing for one degrades the other. **OLTP**: Row-oriented storage (fast single-row access). Normalized schema (3NF) to avoid update anomalies. Indexes for lookup (B-tree). ACID for consistency. High concurrency via locking/mvcc. Examples: PostgreSQL, MySQL, Oracle....

Question 6

What is the difference between SQL and NoSQL databases?

Accepted Answer

**Why both exist**: SQL excels at structured data, complex joins, and strong consistency. NoSQL excels at unstructured/semi-structured data, horizontal scale, and flexible schema. **SQL (relational)**: Fixed schema, ACID, vertical scale (or managed horizontal via Citus, etc.). Optimized for joins and aggregations. Best for: transactional systems, reporting, anything requiring referential integrity. **NoSQL**: Schema-flexible, BASE (eventual consistency), horizontal scale....

Question 7

Explain Common Table Expressions (CTEs) and their benefits.

Accepted Answer

**Architectural Logic**: CTEs are named subqueries in a WITH clause, evaluated as defined (or materialized, depending on engine). They provide logical decomposition without forcing physical materialization. **Why**: Readability and reuse—complex pipelines split into stages (raw → cleansed → aggregated). Recursion for hierarchies (org charts, bill-of-materials). Some engines inline CTEs; others (e.g., BigQuery) can materialize for reuse....

Question 8

Explain SQL Window Functions with examples.

Accepted Answer

**Architectural Logic**: Window functions compute over a "frame" of rows related to the current row without collapsing rows. Syntax: func() OVER (PARTITION BY ... ORDER BY ... [frame]). Categories: Ranking (ROW_NUMBER, RANK, DENSE_RANK), Aggregate (SUM, AVG over partitions), Value (LAG, LEAD, FIRST_VALUE). **Why**: Enable row-level analytics (running totals, moving averages, prior/next comparisons) without self-joins. Self-joins duplicate data and are slower....

Question 9

Explain the use of the MERGE statement in SQL.

Accepted Answer

**Architectural Logic**: MERGE (upsert) performs INSERT, UPDATE, DELETE in one atomic statement based on a join condition. Syntax: MERGE INTO target USING source ON (key) WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ... [WHEN NOT MATCHED BY SOURCE THEN DELETE]. **Why**: Single pass over target and source; avoids read-modify-write race conditions; efficient for SCD Type 1/2, incremental loads, CDC sync. **Scalability**: Join key should be indexed; large source scans can lock target....

Question 10

How do you handle NULL values in SQL? Mention functions like COALESCE and ISNULL.

Accepted Answer

**Architectural Logic**: NULL represents unknown/missing; it propagates through expressions (NULL + 1 = NULL). Handling: IS NULL / IS NOT NULL for predicates; COALESCE(val1, val2, ...) for first non-NULL (portable); ISNULL/IFNULL for dialect-specific default; NULLIF(val1, val2) to normalize to NULL. **Why**: JOIN on NULL yields no match (NULL != NULL). Aggregates ignore NULL except COUNT(*). Explicit handling prevents silent exclusions....

Aarete Data Engineer Interview Questions

Difficulty Breakdown

Key Topics Covered

How to Use This Guide

Companies asking these questions

All 46 Questions

More Interview Prep Guides

Practice with AI — Not Just Reading