DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1681

What role does executor memory and CPU configuration play in maximizing parallelism?

Spark/Big Datamediumpartition0.5 min read
TCS
β†’
1682

What role would Kafka or similar event-driven platforms play in your architecture?

Spark/Big Datahardetloptimizationpartition2.6 min read
Meesho
β†’
1683

What strategies would you use to optimize Spark jobs for both performance and cost on AWS?

Spark/Big Datamediumpartitionspark0.4 min read
Meesho
β†’
1684

What strategies would you use to reduce latency in a streaming data pipeline?

Spark/Big Datahardpartition0.4 min read
BCG
β†’
1685

What techniques ensure deduplication in large datasets?

Spark/Big Datamediumpartitionwindow0.4 min read
Virtusa
β†’
1686

What trade-offs would you consider when choosing between batch processing and real-time streaming?

Spark/Big Datahardpartition0.4 min read
McKinsey
β†’
1687

What's the difference between narrow and wide transformations?

Spark/Big Datamediumjoinpartition0.3 min read
Microsoft
β†’
1688

When submitting Spark jobs, how does the process work in the backend? Explain.

Spark/Big Datahardoptimizationspark0.4 min read
Dunnhumby
β†’
1689

When would you choose a broadcast join over a shuffle join? Any memory risks?

Spark/Big Datamediumjoinsparksql0.4 min read
Microsoft
β†’
1690

Which Spark property controls the number of shuffle partitions?

Spark/Big Datamediumjoinpartitionspark0.3 min read
Puma
β†’
1691

Which Spark version are you using in your project, and why did you choose it?

Spark/Big Dataeasypythonspark0.3 min read
Capgemini
β†’
1692

Why I chose specific technologies (e.g., Spark over traditional ETL tools)

Spark/Big Datahardetlspark0.4 min read
Tiger Analytics
β†’
1693

Why does Hive use Derby by default, and what alternatives are used in production?

Spark/Big Dataeasysparksql0.3 min read
Chryselys
β†’
1694

Worked with UDFs - share examples

Spark/Big Dataeasypython0.3 min read
LTIMindtree
β†’
1695

Write PySpark code to extract data from a CSV and create a table.

Spark/Big Datamediumpartitionpythonspark0.3 min read
HCL
β†’
1696

Write PySpark code to filter and count records.

Spark/Big Dataeasypythonsparksql0.3 min read
Bitwise
β†’
1697

Write PySpark code to filter records based on specific conditions and add a calculated column.

Spark/Big Dataeasypythonsparksql0.3 min read
Bristol Myers Squibb
β†’
1698

Write PySpark code to save a DataFrame in Parquet format to an S3 bucket.

Spark/Big Datamediumpartitionpythonspark0.3 min read
Carelon
β†’
1699

Write a PySpark code snippet to filter rows with a specific condition.

Spark/Big Dataeasypythonsparksql0.3 min read
Fragma Data Systems
β†’
1700

Write a PySpark job that calculates the number of unique users who logged in per day, but exclude any logins from inactive users listed in a separate file.

Spark/Big Datamediumjoinpartitionpython0.3 min read
Dunnhumby
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous1...8384858687...94Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer