DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies in Spark/Big Data

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
421

What strategies would you use to optimize Spark jobs for both performance and cost on AWS?

Spark/Big Datamediumpartitionspark0.4 min read
Meesho
β†’
422

What strategies would you use to reduce latency in a streaming data pipeline?

Spark/Big Datahardpartition0.4 min read
BCG
β†’
423

What techniques ensure deduplication in large datasets?

Spark/Big Datamediumpartitionwindow0.4 min read
Virtusa
β†’
424

What trade-offs would you consider when choosing between batch processing and real-time streaming?

Spark/Big Datahardpartition0.4 min read
McKinsey
β†’
425

What's the difference between narrow and wide transformations?

Spark/Big Datamediumjoinpartition0.3 min read
Microsoft
β†’
426

When submitting Spark jobs, how does the process work in the backend? Explain.

Spark/Big Datahardoptimizationspark0.4 min read
Dunnhumby
β†’
427

When would you choose a broadcast join over a shuffle join? Any memory risks?

Spark/Big Datamediumjoinsparksql0.4 min read
Microsoft
β†’
428

Which Spark property controls the number of shuffle partitions?

Spark/Big Datamediumjoinpartitionspark0.3 min read
Puma
β†’
429

Which Spark version are you using in your project, and why did you choose it?

Spark/Big Dataeasypythonspark0.3 min read
Capgemini
β†’
430

Why I chose specific technologies (e.g., Spark over traditional ETL tools)

Spark/Big Datahardetlspark0.4 min read
Tiger Analytics
β†’
431

Why does Hive use Derby by default, and what alternatives are used in production?

Spark/Big Dataeasysparksql0.3 min read
Chryselys
β†’
432

Worked with UDFs - share examples

Spark/Big Dataeasypython0.3 min read
LTIMindtree
β†’
433

Write PySpark code to extract data from a CSV and create a table.

Spark/Big Datamediumpartitionpythonspark0.3 min read
HCL
β†’
434

Write PySpark code to filter and count records.

Spark/Big Dataeasypythonsparksql0.3 min read
Bitwise
β†’
435

Write PySpark code to filter records based on specific conditions and add a calculated column.

Spark/Big Dataeasypythonsparksql0.3 min read
Bristol Myers Squibb
β†’
436

Write PySpark code to save a DataFrame in Parquet format to an S3 bucket.

Spark/Big Datamediumpartitionpythonspark0.3 min read
Carelon
β†’
437

Write a PySpark code snippet to filter rows with a specific condition.

Spark/Big Dataeasypythonsparksql0.3 min read
Fragma Data Systems
β†’
438

Write a PySpark job that calculates the number of unique users who logged in per day, but exclude any logins from inactive users listed in a separate file.

Spark/Big Datamediumjoinpartitionpython0.3 min read
Dunnhumby
β†’
439

Write a PySpark script to check for missing values and duplicate rows in a DataFrame. How would you ensure data quality before saving it to a storage system?

Spark/Big Datahardpartitionspark0.9 min read
Dunnhumby
β†’
440

Write a PySpark script to filter out invalid records from a dataset and calculate the average for a specific column, ensuring the schema is strictly defined at runtime.

Spark/Big Datamediumpartitionspark0.7 min read
Bristol Myers Squibb
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous1...20212223Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer