Compare Glue partition discovery with Hive MSCK/ADD PARTITION. Explain the operational and cost implications of crawler-based vs. partition-projection approaches. When does partition projection become necessary, and what are its limitations?
SQLmedium
2
Explain how partitioning and bucketing in Hive/Spark optimize queries. What are the trade-offs in bucket count, partition cardinality, and small-file problem? When does over-partitioning or over-bucketing become counterproductive?
SQLmedium
3
Explain how to implement cumulative sum in SQL.
SQLmedium
4
Explain how you would implement partitioning and bucketing for data stored in S3 to improve query performance.
SQLmedium
5
Explain how you would use repartition or coalesce effectively to optimize processing when analyzing data only for a specific region.
SQLmedium
6
Explain indexing and its impact on database performance.
SQLmedium
7
Explain offset management, Sync vs. Async commits, partition assignment strategies and Consumer groups, and handling backpressure in Kafka streams.
SQLmedium
8
Explain row_number, rank, and dense_rank with examples.
SQLmedium
+20 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.