**Pricing**: $5 per TB scanned. No charge for DDL, failed queries, or metadata. **Implication**: Cost is directly tied to bytes scanned. Full table scan on 10 TB = $50 per query. **Optimization**: Partitioning (by date, etc.) prunes; 1% of data = 1% cost. Columnar formats (Parquet, ORC)—read only needed columns. Partition projection avoids metastore calls. Compress data. **Scalability**: At 100 TB scanned/month, $500. At 1 PB, $5,000. Optimization can cut 80%+....
The complete answer continues with detailed implementation patterns, architectural trade-offs, and production-grade considerations. It covers performance optimization strategies, common pitfalls to avoid, and real-world examples from companies like Capco. The answer also includes follow-up discussion points that interviewers commonly explore.
Continue Reading the Full Answer
Unlock the complete expert answer with code examples, trade-offs, and pro tips - plus 1,863+ more.
Or upgrade to Platform Pro - $39
Engineers who used these answers got offers at
AmazonDatabricksSnowflakeGoogleMeta
According to DataEngPrep.tech, this is one of the most frequently asked Cloud/Tools interview questions, reported at 1 company. DataEngPrep.tech maintains a curated database of 1,863+ real data engineering interview questions across 7 categories, verified by industry professionals.