#partition

Questions tagged partition · hard

Explain wide vs. narrow transformations and how they drive shuffle cost, failure domains, and pipeline design. When would you intentionally add a wide transformation, and how do you minimize its impact?

Spark/Big Datahard

Design a Delta table layout for mixed workload: point lookups by user_id, range scans by date, and full partition scans. Compare partitioning vs. Z-ordering—when to use each, and the rewrite cost trade-off.

Spark/Big Datahard

Architecturally, how do Job–Stage–Task boundaries in Spark's execution model impact cluster sizing, shuffle cost, and when would you deliberately collapse or split stages?

Spark/Big Datahard

Design a fault-tolerant Spark Streaming checkpoint strategy: what to persist, recovery semantics, and cost/scalability trade-offs with checkpoint frequency.

Spark/Big Datahard

Explain the Medallion Architecture (Bronze, Silver, Gold layers).

Spark/Big Datahard

Explain the benefits of using DataFrames over RDDs.

Spark/Big Datahard

Explain the difference between batch and streaming data processing in Data Fusion.

Spark/Big Datahard

Given a streaming dataset from Kafka, how would you ingest the data in real-time using Spark?

Spark/Big Datahard

+20 More Questions with Expert Answers

Unlock all 1,800+ expert answers, AI mock interviews, resume analyzer, SQL playground, and personalized progress tracking.

Unlock Full Access Try AI Coach Free

Previous 1 2 3 4...26 Next

Other Tags

#join #python #spark #optimization #sql #window #airflow #etl #bigquery #snowflake #lakehouse

#partition

Questions tagged partition · hard

All easy (0)medium (280+)hard (510+)

Spark/Big Datahard

Architecturally, how do Job–Stage–Task boundaries in Spark's execution model impact cluster sizing, shuffle cost, and when would you deliberately collapse or split stages?

Spark/Big Datahard

Design a fault-tolerant Spark Streaming checkpoint strategy: what to persist, recovery semantics, and cost/scalability trade-offs with checkpoint frequency.

Spark/Big Datahard

Explain the Medallion Architecture (Bronze, Silver, Gold layers).

Spark/Big Datahard

Explain the benefits of using DataFrames over RDDs.

Spark/Big Datahard

Explain the difference between batch and streaming data processing in Data Fusion.

Spark/Big Datahard

Given a streaming dataset from Kafka, how would you ingest the data in real-time using Spark?

Spark/Big Datahard

+20 More Questions with Expert Answers

Unlock all 1,800+ expert answers, AI mock interviews, resume analyzer, SQL playground, and personalized progress tracking.

Unlock Full Access Try AI Coach Free

Previous 1 2 3 4...26 Next

Other Tags

#join #python #spark #optimization #sql #window #airflow #etl #bigquery #snowflake #lakehouse