DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies Β· medium

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
441

What is Broadcast Join and Why is It Required?

Spark/Big Datamediumjoinsparksql0.5 min read
Nagarro
β†’
442

What is Shuffle and How to Handle It in Spark

Spark/Big Datamediumjoinpartitionspark0.5 min read
Nagarro
β†’
443

What is offset management in Kafka?

Spark/Big Datamediumpartitionspark0.3 min read
Delivery Hero
β†’
444

What is the advantage of caching in PySpark? When and why would you use it?

Spark/Big Datamediumjoinspark0.5 min read
Tredence
β†’
445

What is the command to import data from HDFS to Hive?

Spark/Big Datamediumpartition0.4 min read
Coforge
β†’
446

What is the difference between partitions and repartitions in Spark, and when do you use each?

Spark/Big Datamediumpartitionspark0.5 min read
Coforge
β†’
447

What is the most common performance bottleneck in Spark jobs, and how would you resolve it?

Spark/Big Datamediumjoinpartitionspark0.4 min read
Bristol Myers Squibb
β†’
448

What is the role of Zookeeper in Kafka?

Spark/Big Datamediumpartition0.4 min read
Fragma Data Systems
β†’
449

What is the usage of Optimize and REORG commands in Databricks?

Spark/Big Datamediumpartitionwindow0.5 min read
PWC
β†’
450

What performance tuning techniques do you apply in both Sqoop and Spark to optimize their execution?

Spark/Big Datamediumjoinpartitionspark0.4 min read
Infosys
β†’
451

What role does executor memory and CPU configuration play in maximizing parallelism?

Spark/Big Datamediumpartition0.5 min read
TCS
β†’
452

What strategies would you use to optimize Spark jobs for both performance and cost on AWS?

Spark/Big Datamediumpartitionspark0.4 min read
Meesho
β†’
453

What techniques ensure deduplication in large datasets?

Spark/Big Datamediumpartitionwindow0.4 min read
Virtusa
β†’
454

What's the difference between narrow and wide transformations?

Spark/Big Datamediumjoinpartition0.3 min read
Microsoft
β†’
455

When would you choose a broadcast join over a shuffle join? Any memory risks?

Spark/Big Datamediumjoinsparksql0.4 min read
Microsoft
β†’
456

Which Spark property controls the number of shuffle partitions?

Spark/Big Datamediumjoinpartitionspark0.3 min read
Puma
β†’
457

Write PySpark code to extract data from a CSV and create a table.

Spark/Big Datamediumpartitionpythonspark0.3 min read
HCL
β†’
458

Write PySpark code to save a DataFrame in Parquet format to an S3 bucket.

Spark/Big Datamediumpartitionpythonspark0.3 min read
Carelon
β†’
459

Write a PySpark job that calculates the number of unique users who logged in per day, but exclude any logins from inactive users listed in a separate file.

Spark/Big Datamediumjoinpartitionpython0.3 min read
Dunnhumby
β†’
460

Write a PySpark script to filter out invalid records from a dataset and calculate the average for a specific column, ensuring the schema is strictly defined at runtime.

Spark/Big Datamediumpartitionspark0.7 min read
Bristol Myers Squibb
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous1...21222324Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer