Implement a recursive query for hierarchy (employee-manager). Explain the termination guarantees, depth limits, and when a recursive CTE becomes a scalability bottleneck. What alternatives exist for graph-scale hierarchies in Spark or a data lake?
SQLmedium
982
Explain bloom filters in Spark: how they reduce I/O and when they introduce false positives that hurt performance. What are the scalability and cost implications of enabling dynamic partition pruning and bloom filter pushdown at petabyte scale?
SQLhard
983
Design a star schema for retail analytics (e.g., Adidas). Explain the dimensional modeling choices, SCD strategy, and how you would scale this schema for global multi-currency, multi-region deployments. What are the refresh and storage cost implications?
SQLhard
984
Compare Glue partition discovery with Hive MSCK/ADD PARTITION. Explain the operational and cost implications of crawler-based vs. partition-projection approaches. When does partition projection become necessary, and what are its limitations?
SQLmedium
985
Explain how partitioning and bucketing in Hive/Spark optimize queries. What are the trade-offs in bucket count, partition cardinality, and small-file problem? When does over-partitioning or over-bucketing become counterproductive?
SQLmedium
986
Explain how to flatten a multi-level nested JSON file while loading it into BigQuery.
SQLeasy
987
Explain how to implement cumulative sum in SQL.
SQLmedium
988
Explain how you would implement partitioning and bucketing for data stored in S3 to improve query performance.
SQLmedium
+20 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.