DataEngPrep.tech
QuestionsBlogStore
Get PDF Bundle

Interview Questions

Real questions from top companies · hard

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1

Explain BigQuery Architecture.

SQLhard
2

Explain Native vs. External Tables.

SQLhard
3

Articulate the architectural decisions, scalability trade-offs, and cost implications of designing an AWS data platform. How would you justify glue vs. EMR, Redshift vs. Athena, and when would each choice become cost-prohibitive at scale?

SQLhard
4

Explain the architectural rationale for using LeftAntiJoin vs. NOT IN vs. NOT EXISTS in a distributed context. When does LeftAntiJoin become a performance or scalability bottleneck, and how do broadcast vs. shuffle joins affect cost?

SQLhard
5

Explain the architectural trade-offs when optimizing a query on 100M+ rows: indexing vs. partitioning vs. materialized views. When does each approach become cost-prohibitive or operationally burdensome, and how do you quantify impact?

SQLhard
6

Explain bloom filters in Spark: how they reduce I/O and when they introduce false positives that hurt performance. What are the scalability and cost implications of enabling dynamic partition pruning and bloom filter pushdown at petabyte scale?

SQLhard
7

Design a star schema for retail analytics (e.g., Adidas). Explain the dimensional modeling choices, SCD strategy, and how you would scale this schema for global multi-currency, multi-region deployments. What are the refresh and storage cost implications?

SQLhard
8

Explain peer code review and team lead review.

SQLhard

+20 More Questions with Expert Answers

Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.

Get PDF Bundle — from $21Try Free Sample
Previous1...1011121314...34Next