DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies in Spark/Big Data Β· easy

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
41

How do you compare the time investment and value of a task?

Spark/Big Dataeasy0.5 min read
Delivery Hero
β†’
42

How do you handle bad data in Databricks?

Spark/Big Dataeasy0.5 min read
PWC
β†’
43

How do you handle failures in Airflow tasks, and what retry strategies can you use?

Spark/Big Dataeasyairflowpython0.5 min read
Citi
β†’
44

How do you handle schema evolution in Spark, especially when reading data from sources like Parquet or Avro?

Spark/Big Dataeasyspark0.5 min read
Coforge
β†’
45

How do you prioritize your tasks in a multi-project environment?

Spark/Big Dataeasy0.5 min read
PLEO
β†’
46

Sqoop Incremental Import?

Spark/Big Dataeasysql0.6 min read
Altimetrik
β†’
47

Sqoop command for importing multiple tables

Spark/Big Dataeasyairflowsql0.5 min read
Meesho
β†’
48

Suppose you have a DAG that ingests data from multiple databases. How would you increase task parallelism in Airflow to improve performance without overloading the system?

Spark/Big Dataeasyairflowsql0.6 min read
Dunnhumby
β†’
49

Suppose you need to import 5 tables from an external RDBMS (like MySQL) into Hadoop HDFS. Write the Sqoop command

Spark/Big Dataeasyairflowsql0.6 min read
Meesho
β†’
50

Task Dependencies in DAG

Spark/Big Dataeasyairflow0.5 min read
Verizon
β†’
51

What are Hadoop commands for Get and Merge?

Spark/Big Dataeasyspark0.4 min read
Altimetrik
β†’
52

What are the advantages of using Dataproc over a traditional Hadoop setup?

Spark/Big Dataeasyspark0.5 min read
Aarete
β†’
53

What are the advantages of using Delta Lake over Parquet?

Spark/Big Dataeasy0.5 min read
Puma
β†’
54

What are the differences between %pip and %conda commands in Databricks?

Spark/Big Dataeasypython0.6 min read
TCS
β†’
55

What are the different delivery semantics in Kafka (at least-once, at-most-once, exactly-once)?

Spark/Big Dataeasy0.5 min read
Fragma Data Systems
β†’
56

What are the different modes in which you can submit Spark jobs? Explain each.

Spark/Big Dataeasyspark0.5 min read
Dunnhumby
β†’
57

What are the performance considerations when using Auto Loader?

Spark/Big Dataeasy0.5 min read
TCS
β†’
58

What are the steps to connect to Salesforce?

Spark/Big Dataeasyspark0.4 min read
Hexaware
β†’
59

What are the steps to debug a failed workflow in Databricks?

Spark/Big Dataeasy0.4 min read
TCS
β†’
60

What are the steps to execute a Python file with PySpark code on an EC2 environment?

Spark/Big Dataeasypythonspark0.4 min read
Carelon
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous12345Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer