Listing map/filter as narrow and groupBy as wide isn't enough. See the answer that demonstrates real Spark internals knowledge.
What is the difference between narrow and wide transformations in Apache Spark? Explain with examples.
Narrow transformations are those where each input partition contributes to at most one output partition. Examples: map, filter, flatMap. Wide transformations require data from multiple partitions to be shuffled across the network. Examples: groupByKey, reduceByKey, join. Wide transformations create new stages in the execution plan.
Narrow transformations (no shuffle): Each partition is self-contained.
map, filter, flatMap, mapPartitions, unionWide transformations (shuffle required): Data must cross partition boundaries.
groupBy, reduceByKey, join (sort-merge), repartition, distinctpython# Wide join (sort-merge) — shuffles both DataFrames large_df.join(other_large_df, 'key') # Narrow join (broadcast) — no shuffle, 10-100x faster from pyspark.sql.functions import broadcast large_df.join(broadcast(small_lookup), 'key')
- Add a broadcast hint to eliminate a shuffle
- Pre-partition data with repartition(col) before repeated joins on the same key
- Use coalesce instead of repartition when reducing partitions
Listing examples is a junior answer. The senior answer connects narrow/wide to the Spark UI, explains how to debug slow stages, and demonstrates optimization strategies like broadcast joins and late-stage shuffles.
Paste your answer and get instant AI-powered feedback with a FAANG-level improved version.
Analyze My Answer — Free3 free analyses per day. No sign-up required.