DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies in Spark/Big Data Β· easy

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1

What is the difference between Managed and External tables in Hive/Spark?

Spark/Big Dataeasyspark0.4 min read
CitiDunnhumbyFragma Data Systems
β†’
2

When would you architecturally choose Dataset[T] over DataFrame in a Scala Spark pipeline, and what are the scalability and portability trade-offs? Include type-safety benefits vs. operational constraints.

Spark/Big Dataeasyetlpythonspark0.6 min read
CoforgeLTIMindtree
β†’
3

What is the difference between Managed and External Tables in Databricks?

Spark/Big Dataeasysnowflakespark0.6 min read
AltimetrikIncedo
β†’
4

A JSON file with evolving schema needs to be ingested into a DataFrame. How would you handle new fields dynamically in PySpark without breaking the job for previous structures?

Spark/Big Dataeasyspark0.3 min read
Dunnhumby
β†’
5

A task intermittently fails due to external API limitations. How would you configure Airflow retries and alerts to manage this situation efficiently?

Spark/Big Dataeasyairflow0.2 min read
Dunnhumby
β†’
6

Accumulator and Broadcast Variables - explain

Spark/Big Dataeasy0.2 min read
LTIMindtree
β†’
7

Approaches to handling multiple tasks within a sprint?

Spark/Big Dataeasy0.6 min read
Snowflake
β†’
8

Cache() vs Persist(): Explain the difference and use cases for caching and persisting data in Spark with memory levels.

Spark/Big Dataeasyspark0.5 min read
Capgemini
β†’
9

Can you explain dynamic resource allocation in Spark? How does it help optimize job performance?

Spark/Big Dataeasyspark0.5 min read
Coforge
β†’
10

Can you explain the concept of incremental loading in Sqoop and how to use it for job processing?

Spark/Big Dataeasy0.5 min read
Infosys
β†’
11

Can you give a use case where Delta Live Tables would be ideal?

Spark/Big Dataeasyetllakehousespark0.5 min read
TCS
β†’
12

Can you share a time when you had to shift focus due to urgent tasks?

Spark/Big Dataeasy0.5 min read
Moonfare
β†’
13

Cluster Resource Allocation in Spark

Spark/Big Dataeasyspark0.4 min read
Walmart
β†’
14

Compare HDFS and cloud-based storage systems in terms of scalability and performance.

Spark/Big Dataeasy0.5 min read
Swiggy
β†’
15

Compare ORC and Parquet

Spark/Big Dataeasybigqueryspark0.3 min read
KPMG
β†’
16

Compare Spark SQL vs. Hive Performance.

Spark/Big Dataeasysparksql0.4 min read
HCL
β†’
17

Compare Spark and MapReduce for iterative workloads

Spark/Big Dataeasyspark0.4 min read
Microsoft
β†’
18

Concatenate Columns in PySpark

Spark/Big Dataeasyspark0.4 min read
Presidio
β†’
19

Controlling mappers in MapReduce

Spark/Big Dataeasy0.4 min read
JP Morgan
β†’
20

Create a DataFrame with default column types

Spark/Big Dataeasypythonsparksql0.4 min read
KPMG
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
123...5Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer