**Why Explode:** Nested JSON (items array per order) → one row per item for joins/aggregation. Normalization for star schema.
**Functions:** explode()—one-to-many; null/empty arrays drop row. explode_outer()—keeps row, null for empty. posexplode()—adds position index. For multiple array cols: arrays_zip then explode.
**Scalability Risk:** Large arrays cause row explosion—1 row with 10K items → 10K rows. Can cause skew. Use explode then repartition or broadcast the small side for joins....
The complete answer continues with detailed implementation patterns, architectural trade-offs, and production-grade considerations. It covers performance optimization strategies, common pitfalls to avoid, and real-world examples from companies like Lumiq. The answer also includes follow-up discussion points that interviewers commonly explore.
Continue Reading the Full Answer
Unlock the complete expert answer with code examples, trade-offs, and pro tips - plus 1,863+ more.
Or upgrade to Platform Pro - $39
Engineers who used these answers got offers at
AmazonDatabricksSnowflakeGoogleMeta
According to DataEngPrep.tech, this is one of the most frequently asked Python/Coding interview questions, reported at 1 company. DataEngPrep.tech maintains a curated database of 1,863+ real data engineering interview questions across 7 categories, verified by industry professionals.