Question 1

What are your salary expectations for this role?

Accepted Answer

**Situation**: I was negotiating with a FAANG-tier company after multiple rounds. **Task**: Communicate compensation expectations without anchoring low or pricing myself out. **Action**: I researched Levels.fyi, Blind, and Glassdoor for the role, level, and geo. I framed my response: "Based on market data for Staff/Principal Data Engineering in [location], total comp typically ranges $X–$Y....

Question 2

Where do you see yourself in your career five years from now?

Accepted Answer

**Situation**: In a Principal-level loop, the hiring manager asked about long-term trajectory. **Task**: Align my vision with the company's needs while showing ambition and impact focus. **Action**: I said: "In five years, I aim to be a Staff/Principal data engineer driving architectural decisions and scaling data platforms. I want to own data strategy for a significant domain—perhaps a business unit or company-wide—and be the go-to person for complex, high-stakes problems....

Question 3

Briefly introduce yourself and walk us through your journey as a Data Engineer so far.

Accepted Answer

**Situation**: I joined as a software engineer and saw data as a bottleneck—pipelines broke, nobody trusted the numbers. **Task**: Transition into data engineering and build reliable, scalable systems. **Action**: I moved from ETL dev to owning cloud data platforms—designed data lakes on AWS/GCP, optimized Spark jobs (reduced costs 40% via partition pruning and skew fixes), implemented Kafka/Flink streaming, and led migrations to Delta Lake....

Question 4

Can you explain the difference between OLTP and OLAP?

Accepted Answer

**OLTP**: Optimized for many small transactions (inserts, updates, deletes). Row-oriented, normalized, high concurrency. Examples: MySQL, PostgreSQL. **OLAP**: Optimized for complex analytical queries and aggregations on large datasets. Column-oriented, denormalized (star/snowflake). Examples: Snowflake, BigQuery, Redshift. **Why the split**: Different access patterns; mixing them degrades both. OLTP needs low latency and ACID; OLAP needs scan throughput....

Question 5

Explain the concept of ACID properties in the context of databases.

Accepted Answer

**ACID**: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed data persists). **Why it matters**: Without ACID, financial and operational data become inconsistent; retries and failures create duplicates or lost updates. **Scalability trade-off**: Strict isolation (e.g., Serializable) limits throughput; most OLTP systems use Read Committed or Repeatable Read....

Question 6

Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.

Accepted Answer

**INNER JOIN**: Only rows with matches in both tables. **LEFT JOIN**: All from left; matches from right; NULLs where no match. **RIGHT JOIN**: All from right; matches from left. **FULL JOIN**: All from both; NULLs where no match. **Why it matters**: Join choice affects result cardinality and semantics. Wrong join = wrong numbers. **Scalability**: Hash joins are common; broadcast for small dimension. FULL OUTER can be expensive—large shuffle....

Question 7

How do you handle NULL values in SQL? Mention functions like COALESCE and NULLIF.

Accepted Answer

**Approaches**: IS NULL / IS NOT NULL for filtering. **COALESCE(val1, val2, ...)**: First non-NULL value; useful for defaults. **NULLIF(val1, val2)**: Returns NULL if equal; e.g., NULLIF(divisor, 0) to avoid divide-by-zero. **Why it matters**: NULL propagates in expressions; aggregate functions ignore NULL (except COUNT(*)). JOIN on NULL yields no match (NULL ≠ NULL). **Scalability**: COALESCE in SELECT is cheap; in WHERE or JOIN it can prevent index use....

Question 8

What is a Common Table Expression (CTE), and when would you use it?

Accepted Answer

**CTE**: A named temporary result set in a WITH clause, referenced in the main query. **Use cases**: Readability—break complex queries into steps. Reusability—reference same CTE multiple times. Recursion—hierarchies (org chart, bills of materials). **Why it matters**: CTEs improve maintainability; deep subqueries are hard to debug. **Scalability**: In some engines (e.g., PostgreSQL), CTEs are optimization fences—materialized once. In others (Snowflake, BigQuery), they're inlined....

Question 9

What is the difference between a primary key and a unique key?

Accepted Answer

**Primary Key**: Unique identifier; NOT NULL; one per table; often clustered. **Unique Key**: Enforces uniqueness; can have NULL (SQL allows one NULL per column in uniqueness); multiple per table. **Why it matters**: PK defines identity and referential integrity; unique constrains alternate keys (e.g., email). **Scalability**: PK is often the clustering key; choice affects physical layout. Unique indexes enable lookups. **Cost**: Each constraint adds index overhead....

Question 10

What is the difference between WHERE and HAVING clauses in SQL?

Accepted Answer

**WHERE**: Filters rows before grouping/aggregation; cannot use aggregate functions. **HAVING**: Filters groups after GROUP BY; used with aggregate functions. **Why it matters**: WHERE reduces rows early (cheaper); HAVING filters on computed values. Misplacing predicates causes wrong results or inefficiency. **Scalability**: Pushing filters to WHERE reduces data before aggregation; HAVING on large grouped result can be expensive....

EPAM Data Engineer Interview Questions

Difficulty Breakdown

Key Topics Covered

How to Use This Guide

Companies asking these questions

All 38 Questions

More Interview Prep Guides

Practice with AI — Not Just Reading