Question 1

Discuss differences between ROW_NUMBER(), RANK(), and DENSE_RANK(), and provide examples from your projects.

Accepted Answer

**ROW_NUMBER()**: Unique sequential numbers (1, 2, 3...); no ties—deterministic only with ORDER BY uniqueness. **RANK()**: Same rank for ties; skips (1, 2, 2, 4). **DENSE_RANK()**: Same rank for ties; no gaps (1, 2, 2, 3). **Project examples**: ROW_NUMBER() to deduplicate events by (user_id, event_time) keeping first—critical when upstream sends duplicates. DENSE_RANK() for 'top 10 products per category' reports—avoids gaps when filtering....

Question 2

Describe a time when you had to optimize a slow SQL query. What steps did you take?

Accepted Answer

**Situation**: A critical exec report was timing out at 30+ minutes; SLA was 5 minutes. **Task**: Diagnose and fix without changing business logic. **Action**: I ran EXPLAIN (ANALYZE) and found: (1) missing index on join key causing full table scan, (2) cross join that could be inner join with a filter, (3) filter in HAVING that could move to WHERE to reduce rows early, (4) unnecessary ORDER BY on a subquery....

Question 3

Explain Common Table Expressions (CTEs) and their benefits.

Accepted Answer

**Architectural Logic**: CTEs are named subqueries in a WITH clause, evaluated as defined (or materialized, depending on engine). They provide logical decomposition without forcing physical materialization. **Why**: Readability and reuse—complex pipelines split into stages (raw → cleansed → aggregated). Recursion for hierarchies (org charts, bill-of-materials). Some engines inline CTEs; others (e.g., BigQuery) can materialize for reuse....

Question 4

Explain SQL Window Functions with examples.

Accepted Answer

**Architectural Logic**: Window functions compute over a "frame" of rows related to the current row without collapsing rows. Syntax: func() OVER (PARTITION BY ... ORDER BY ... [frame]). Categories: Ranking (ROW_NUMBER, RANK, DENSE_RANK), Aggregate (SUM, AVG over partitions), Value (LAG, LEAD, FIRST_VALUE). **Why**: Enable row-level analytics (running totals, moving averages, prior/next comparisons) without self-joins. Self-joins duplicate data and are slower....

Question 5

Explain the use of the MERGE statement in SQL.

Accepted Answer

**Architectural Logic**: MERGE (upsert) performs INSERT, UPDATE, DELETE in one atomic statement based on a join condition. Syntax: MERGE INTO target USING source ON (key) WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ... [WHEN NOT MATCHED BY SOURCE THEN DELETE]. **Why**: Single pass over target and source; avoids read-modify-write race conditions; efficient for SCD Type 1/2, incremental loads, CDC sync. **Scalability**: Join key should be indexed; large source scans can lock target....

Question 6

How do you handle NULL values in SQL? Mention functions like COALESCE and ISNULL.

Accepted Answer

**Architectural Logic**: NULL represents unknown/missing; it propagates through expressions (NULL + 1 = NULL). Handling: IS NULL / IS NOT NULL for predicates; COALESCE(val1, val2, ...) for first non-NULL (portable); ISNULL/IFNULL for dialect-specific default; NULLIF(val1, val2) to normalize to NULL. **Why**: JOIN on NULL yields no match (NULL != NULL). Aggregates ignore NULL except COUNT(*). Explicit handling prevents silent exclusions....

Question 7

How do you optimize a long-running SQL query?

Accepted Answer

**Architectural Logic**: Optimization is diagnostic-first. 1. Profile: EXPLAIN/EXPLAIN ANALYZE to find bottleneck (scan, join, sort, spill). 2. Reduce input: Filter early (WHERE, partition pruning); SELECT only needed columns. 3. Indexing: B-tree on filter/join columns; avoid over-indexing (writes slow). 4. Partitioning: Date/tenant partitioning for pruning. 5. Join strategy: Broadcast small dims; avoid cross joins. 6. Statistics: Up-to-date stats for planner. 7....

Question 8

How would you handle duplicate records in an SQL table?

Accepted Answer

**Architectural Logic**: 1. Identify: GROUP BY key HAVING COUNT(*) > 1. 2. Resolve: ROW_NUMBER() OVER (PARTITION BY key ORDER BY tiebreaker) then keep rn=1. 3. Prevent: UNIQUE constraint, PK; MERGE/INSERT with conflict handling. **Why**: Duplicates indicate ingestion bug, missing idempotency, or intentional multi-version (e.g., SCD2). Fix root cause before ad-hoc dedup. **Scalability**: DELETE from CTE/subquery can lock; prefer INSERT INTO new_table SELECT ......

Question 9

Can you modify a partitioned table into a non-partitioned one and vice-versa? How?

Accepted Answer

**Partitioned → Non-partitioned:** CREATE TABLE new AS SELECT * FROM old (partition cols become regular). **Non → Partitioned:** CREATE TABLE new PARTITIONED BY (col) AS SELECT ... INSERT from old.

**Hive/Spark:** No in-place alter of partition scheme. Recreate table, migrate data....

Question 10

Describe how Dataproc integrates with BigQuery for processing large datasets.

Accepted Answer

**Architectural Logic**: Dataproc (managed Spark) + BigQuery Connector enables heavy transforms that exceed BigQuery SQL capabilities. **Integration**: `spark.read.format("bigquery").option("table", "project:dataset.table").load()`; write with same format. Connector pushes filters and predicates when possible. **Why Use**: Large-scale transforms, ML prep, complex multi-table joins; BigQuery SQL has limits. **Scalability**: Size clusters for workload; use same region to avoid egress....

Aarete SQL Interview Questions

Difficulty Breakdown

Key Topics Covered

How to Use This Guide

Companies asking these questions

All 22 Questions

More Interview Prep Guides

Practice with AI — Not Just Reading