**Primary Key**: Unique identifier; NOT NULL; one per table; often clustered. **Unique Key**: Enforces uniqueness; can have NULL (SQL allows one NULL per column in uniqueness); multiple per table. **Why it matters**: PK defines identity and referential integrity; unique...
Red Flag: Saying 'unique key allows multiple NULLs' (depends on SQL dialect). Pro-Move: Clarify that in data lakes, PK is a logical concept—Delta Lake enforces via merge; enforcement is eventual.
This hard-level SQL question appears frequently in data engineering interviews at companies like Accenture, Cognizant, EPAM, and 1 others. While less common, it tests deeper understanding that distinguishes strong candidates. Mastering the underlying concepts (spark, sql) will help you answer variations of this question confidently.
This is a senior-level question that tests architectural thinking. Lead with the high-level design, then drill into specifics. Discuss trade-offs explicitly - there is rarely one correct answer. Show awareness of scale, fault tolerance, and operational complexity.
Primary Key: Unique identifier; NOT NULL; one per table; often clustered. Unique Key: Enforces uniqueness; can have NULL (SQL allows one NULL per column in uniqueness); multiple per table. Why it matters: PK defines identity and referential integrity; unique constrains alternate keys (e.g., email). Scalability: PK is often the clustering key; choice affects physical layout. Unique indexes enable lookups. Cost: Each constraint adds index overhead. Production note: In distributed systems (Spark, Delta), PK is logical; enforcement may be application-level.
This answer is partially locked
Unlock the full expert answer with code examples and trade-offs
Practice real interviews with AI feedback, track progress, and get interview-ready faster.
Pro starts at $19/mo - cancel anytime
Trusted by 10,000+ aspiring data engineers
According to DataEngPrep.tech, this is one of the most frequently asked SQL interview questions, reported at 4 companies. DataEngPrep.tech maintains a curated database of 1,863+ real data engineering interview questions across 7 categories, verified by industry professionals.