Question 1

Explain the differences between a Data Lake and a Data Warehouse.

Accepted Answer

**Data Lake**: Low-cost object storage (S3, ADLS) for raw, semi-structured, unstructured data. Schema-on-read; used for exploratory analytics, ML, archival. **Data Warehouse**: Structured, curated storage optimized for SQL; schema-on-write; used for BI and reporting. **Why both exist**: Lakes offer flexibility and cost at scale; warehouses offer query performance and concurrency....

Question 2

Explain the concept of ACID properties in the context of databases.

Accepted Answer

**ACID**: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed data persists). **Why it matters**: Without ACID, financial and operational data become inconsistent; retries and failures create duplicates or lost updates. **Scalability trade-off**: Strict isolation (e.g., Serializable) limits throughput; most OLTP systems use Read Committed or Repeatable Read....

Question 3

Explain Common Table Expressions (CTEs) and their benefits.

Accepted Answer

**Architectural Logic**: CTEs are named subqueries in a WITH clause, evaluated as defined (or materialized, depending on engine). They provide logical decomposition without forcing physical materialization. **Why**: Readability and reuse—complex pipelines split into stages (raw → cleansed → aggregated). Recursion for hierarchies (org charts, bill-of-materials). Some engines inline CTEs; others (e.g., BigQuery) can materialize for reuse....

Question 4

Explain the difference between UNION and UNION ALL.

Accepted Answer

**Architectural Logic**: UNION concatenates result sets and deduplicates (implicit DISTINCT); UNION ALL concatenates and keeps all rows. **Why**: UNION requires sort/hash to detect duplicates—O(n log n) or O(n) with hash. UNION ALL is a simple concatenation—O(n). **Scalability**: UNION doubles I/O (read both, write deduped). At scale, UNION on large inputs can spill and bottleneck. **Cost**: UNION can 2–5x bytes processed vs UNION ALL when inputs are large....

Question 5

What is the difference between a clustered and non-clustered index?

Accepted Answer

**Architectural Logic**: Clustered: Data stored in index order; one per table (usually PK). Table = clustered index. Non-clustered: Separate structure with pointers to data; multiple per table. **Why**: Clustered optimizes range scans (e.g., ORDER BY PK); inserts may cause page splits. Non-clustered optimizes lookups and covering queries. **Scalability**: Clustered on wrong key can hurt insert performance. Non-clustered adds write amplification....

Question 6

What is the difference between DELETE and TRUNCATE?

Accepted Answer

**Architectural Logic**: DELETE: DML; row-by-row (or batch); logged per row; supports WHERE; triggers fire; slower. TRUNCATE: DDL; deallocates data pages; minimal logging; no WHERE; resets identity; faster. **Why**: DELETE for conditional removal, audit trails, cascade. TRUNCATE for full table clear (e.g., staging before load). **Scalability**: DELETE locks rows/pages; long-running on large tables blocks. TRUNCATE is metadata change + page deallocation—near-instant....

Question 7

What is a CTE (Common Table Expression)? What are its uses?

Accepted Answer

**Architectural Logic:** A CTE is a named temporary result set (WITH ... AS) scoped to a single statement. **Why use CTEs:** Readability, modular logic, recursion (hierarchies), and avoiding repeated subquery evaluation in some engines. **Scalability:** CTEs may be inlined or materialized—behavior varies by engine. In BigQuery/Snowflake, CTEs are typically not materialized unless hinted; repeated references can re-execute. **Cost:** Materialized CTEs (e.g., WITH ......

Question 8

Aggregate surface areas and calculate cumulative surface area using the LAG function.

Accepted Answer

**LAG:** Previous row. LAG(surface_area, 1, 0) OVER (ORDER BY id). **Cumulative:** SUM(surface_area) OVER (ORDER BY id) or ROWS UNBOUNDED PRECEDING.

SELECT id, surface_area,
       SUM(surface_area) OVER (ORDER BY id) AS cumulative_area,
       surface_area + LAG(surface_area, 1, 0) OVER (ORDER BY id) AS prev_plus_curr
FROM surfaces;

Question 9

Are you comfortable with the variable pay structure, and what are your expectations for the base salary?

Accepted Answer

**Situation:** I've worked in roles with variable comp tied to team and company performance.

**Task:** I needed to understand the structure and align expectations with market and value.

**Action:** I researched market rates for this role/level in [location]. I'm open to variable pay when metrics are clear and attainable. I focus on total compensation—base + variable + equity + benefits....

Question 10

Building ETL pipelines to capture changes when new records are inserted into source tables?

Accepted Answer

**Patterns:** (1) Incremental: watermark (max id/updated_at), query WHERE id > watermark. (2) CDC: Debezium, AWS DMS—log-based, low latency. (3) Hash/checksum: compare batches. (4) MERGE: upsert by key.

**Snowflake:** Stage + MERGE. Store last watermark in state table. Use event time, not process time, for late arrivals....

Easy SQL Interview Questions

Difficulty Breakdown

Key Topics Covered

How to Use This Guide

Companies asking these questions

All 60 Questions

More Interview Prep Guides

Practice with AI — Not Just Reading