DataEngPrep.tech

JavaScript is required to use this application. Please enable JavaScript in your browser settings or disable any extensions that may be blocking scripts.

DataEngPrep.tech

Questions Practice AI Coach Dashboard Packs Blog

Interview Questions

Real questions from top companies in Spark/Big Data

700+ Easy450+ Medium650+ Hard

All Categories Behavioral Spark/Big Data SQL Python/Coding System Design/Architecture Cloud/Tools General/Othereasy medium hard

What are the key components of the Spark execution model (Job, Stage, Task)?

Spark/Big Datahardjoinoptimizationpartition0.7 min read

FedEx DataworksFreight Tiger

What is Adaptive Query Execution (AQE) in Spark 3.x, and how does it improve performance?

Spark/Big Datamediumjoinpartitionspark0.6 min read

HashedInSnowflake

What is Spark's Catalyst Optimizer? Explain its stages.

Spark/Big Datahardjoinoptimizationspark0.7 min read

DunnhumbyFragma Data Systems

What is the difference between Spark RDDs, DataFrames, and Datasets?

Spark/Big Datahardoptimizationpartitionpython0.6 min read

AccentureFragma Data Systems

What is the difference between repartition and coalesce in Spark?

Spark/Big Datamediumpartitionspark0.6 min read

AccentureFedEx Dataworks

What is the small-file problem in Spark, and how do you solve it?

Spark/Big Datahardpartitionsparkwindow0.7 min read

Daniel WellingtonIncedo

When and how do you use Broadcast Join in Spark?

Spark/Big Datamediumjoinsparksql0.6 min read

Delivery HeroFragma Data Systems

What is broadcasting in Spark, and why is it used? Can you give an example of its use?

Spark/Big Datamediumjoinsparksql0.7 min read

AltimetrikInfosys

What is the difference between Managed and External Tables in Databricks?

Spark/Big Dataeasysnowflakespark0.6 min read

AltimetrikIncedo

What is the difference between map and flatMap in Spark, and when would you use each?

Spark/Big Datamediumpartitionspark0.6 min read

AltimetrikInfosys

What is the purpose of the Bronze, Silver, and Gold layers in a data pipeline?

Spark/Big Datamedium0.6 min read

CapgeminiInfosys

What work is done by the executor memory in Spark?

Spark/Big Datamediumjoinpartitionspark0.6 min read

AltimetrikInfosys

When and how do you use Broadcast Join?

Spark/Big Datamediumjoinsparksql0.6 min read

AltimetrikInfosys

Why is SparkSession used in Spark 2.0 and later versions?

Spark/Big Datahardpythonsparksql0.5 min read

AltimetrikInfosys

Write a Python script to find the count of each word in a text file using Spark.

Spark/Big Datamediumpartitionpythonspark0.4 min read

AltimetrikInfosys

Write the PySpark code to find the second highest salary in each department.

Spark/Big Datamediumpartitionsparksql0.5 min read

AltimetrikInfosys

A JSON file with evolving schema needs to be ingested into a DataFrame. How would you handle new fields dynamically in PySpark without breaking the job for previous structures?

Spark/Big Dataeasyspark0.3 min read

A data pipeline processes files for different clients stored in separate directories. Explain how you would use dynamic DAG creation to handle client-specific workflows in Airflow.

Spark/Big Datahardairflow0.3 min read

A task intermittently fails due to external API limitations. How would you configure Airflow retries and alerts to manage this situation efficiently?

Spark/Big Dataeasyairflow0.2 min read

Accumulator and Broadcast Variables - explain

Spark/Big Dataeasy0.2 min read

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

Previous 1 2 3 4 5...23 Next

Categories

All Questions SQL Spark / Big Data Python / Coding System Design Cloud / Tools Behavioral

By Company

Amazon Google Databricks Snowflake Microsoft Netflix Uber TCS

Interview Guides

All Guides Top SQL Questions Top Spark Questions Top Python Questions Top System Design SQL Window Functions ETL Questions Data Modeling

Products

AI Interview Coach Answer Analyzer SQL Playground Resume Analyzer Interview Packs Pricing

Company

About Us Contact Us AI Disclosure Disclaimer Terms of Service Privacy Policy

© 2026 DataEngPrep.tech. All rights reserved.

About Blog Contact Disclaimer