DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1621

What are the challenges of implementing real-time analytics using Spark Streaming?

Spark/Big Datahardpartitionsparkwindow0.5 min read
Goldman Sachs
β†’
1622

What are the differences between %pip and %conda commands in Databricks?

Spark/Big Dataeasypython0.6 min read
TCS
β†’
1623

What are the different delivery semantics in Kafka (at least-once, at-most-once, exactly-once)?

Spark/Big Dataeasy0.5 min read
Fragma Data Systems
β†’
1624

What are the different modes in which you can submit Spark jobs? Explain each.

Spark/Big Dataeasyspark0.5 min read
Dunnhumby
β†’
1625

What are the key differences between Map and Reduce in Spark?

Spark/Big Datamediumpartitionspark0.4 min read
Nielsen
β†’
1626

What are the key performance tuning techniques you apply in Spark jobs to improve performance?

Spark/Big Datamediumjoinpartitionspark0.4 min read
Coforge
β†’
1627

What are the key properties of Delta Lake that differentiate it from traditional data lakes?

Spark/Big Datahard0.5 min read
Puma
β†’
1628

What are the limitations of the REORG command with respect to large datasets?

Spark/Big Datamediumpartition0.5 min read
PWC
β†’
1629

What are the performance considerations when using Auto Loader?

Spark/Big Dataeasy0.5 min read
TCS
β†’
1630

What are the performance trade-offs of using salting to mitigate data skewness?

Spark/Big Datamediumjoinpartition0.5 min read
PWC
β†’
1631

What are the steps to connect to Salesforce?

Spark/Big Dataeasyspark0.4 min read
Hexaware
β†’
1632

What are the steps to debug a failed workflow in Databricks?

Spark/Big Dataeasy0.4 min read
TCS
β†’
1633

What are the steps to efficiently process 1 TB of data in Spark?

Spark/Big Datamediumpartitionsparksql0.5 min read
HashedIn
β†’
1634

What are the steps to execute a Python file with PySpark code on an EC2 environment?

Spark/Big Dataeasypythonspark0.4 min read
Carelon
β†’
1635

What are the trade-offs between using Glue Catalog vs. Hive Metastore for metadata management?

Spark/Big Dataeasysql0.4 min read
Capco
β†’
1636

What are transient clusters in EMR, and when would you use them?

Spark/Big Dataeasyetl0.5 min read
Persistent Systems
β†’
1637

What causes Out of Memory (OOM) issues in Databricks, and how do you resolve them?

Spark/Big Datamediumpartitionspark0.5 min read
PWC
β†’
1638

What causes data skewness in Spark, and how can it be resolved?

Spark/Big Datamediumjoinpartitionspark0.5 min read
PWC
β†’
1639

What configuration parameters are critical for enabling AQE effectively?

Spark/Big Datamediumjoinpartitionspark0.4 min read
PWC
β†’
1640

What configurations are needed to pass parameters to a Databricks notebook?

Spark/Big Dataeasy0.3 min read
Virtusa
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous1...8081828384...94Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer