DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies in Spark/Big Data Β· medium

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1

What is the difference between repartition and coalesce in Apache Spark?

Spark/Big Datamediumpartitionpythonspark1 min read
BCGCitiDunnhumbyFragma Data Systems+3
β†’
2

What is the difference between cache() and persist() in Spark? When would you use each?

Spark/Big Datamediumpartitionspark0.7 min read
AccentureCoforgeFreechargeImpetus+1
β†’
3

What is the difference between groupByKey and reduceByKey in Spark?

Spark/Big Datamediumpartitionspark0.8 min read
AccentureCapcoCoforgeNagarro+1
β†’
4

What is the difference between narrow and wide transformations in Apache Spark? Explain with examples.

Spark/Big Datamediumjoinpartitionpython0.9 min read
CoforgeDelivery HeroDunnhumbyFragma Data Systems+1
β†’
5

What strategies can you use to handle skewed data in Spark?

Spark/Big Datamediumjoinpartitionspark0.5 min read
BCGBitwiseCitiHashedIn
β†’
6

Explain the difference between Spark's map() and flatMap() transformations.

Spark/Big Datamediumpartitionspark0.4 min read
Delivery HeroDunnhumbyFragma Data Systems
β†’
7

Explain the concept of Broadcast Join in Spark. When should it be used?

Spark/Big Datamediumjoinsparksql0.4 min read
Delivery HeroDunnhumbyFragma Data Systems
β†’
8

Convert complex SQL (CTEs, window functions, subqueries) to production-grade PySpark. Discuss when to use spark.sql() vs. DataFrame API, and the implications for testability, partitioning, and execution predictability.

Spark/Big Datamediumpartitionpythonspark0.8 min read
DatameticaS&P Global
β†’
9

Explain how Adaptive Query Execution changes the economics of Spark tuning. What problems does it solve at runtime, and when might you still need manual intervention (e.g., salting, broadcast hints)?

Spark/Big Datamediumjoinpartitionspark0.6 min read
FedEx DataworksPWC
β†’
10

Architect incremental load in ADF + Databricks with idempotency, late-arrival handling, and cost/scalability implications of watermark vs. change data capture.

Spark/Big Datamediumpartition1 min read
DeloitteIncedo
β†’
11

Explain strategies for managing schema changes in PySpark over time.

Spark/Big Datamediumpartitionspark0.8 min read
AccentureYash Technologies
β†’
12

How do you drop columns with null values in PySpark?

Spark/Big Datamediumpartitionspark0.6 min read
DatameticaGlobant
β†’
13

How do you handle data skewness in Spark?

Spark/Big Datamediumjoinpartitionspark0.7 min read
AccentureBitwise
β†’
14

How would you read data from a web API using PySpark?

Spark/Big Datamediumairflowpartitionspark0.7 min read
AltimetrikInfosys
β†’
15

What is Adaptive Query Execution (AQE) in Spark 3.x, and how does it improve performance?

Spark/Big Datamediumjoinpartitionspark0.6 min read
HashedInSnowflake
β†’
16

What is the difference between repartition and coalesce in Spark?

Spark/Big Datamediumpartitionspark0.6 min read
AccentureFedEx Dataworks
β†’
17

When and how do you use Broadcast Join in Spark?

Spark/Big Datamediumjoinsparksql0.6 min read
Delivery HeroFragma Data Systems
β†’
18

What is broadcasting in Spark, and why is it used? Can you give an example of its use?

Spark/Big Datamediumjoinsparksql0.7 min read
AltimetrikInfosys
β†’
19

What is the difference between map and flatMap in Spark, and when would you use each?

Spark/Big Datamediumpartitionspark0.6 min read
AltimetrikInfosys
β†’
20

What is the purpose of the Bronze, Silver, and Gold layers in a data pipeline?

Spark/Big Datamedium0.6 min read
CapgeminiInfosys
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
123...5Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer