DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies Β· medium

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
401

Write a query to remove duplicate records from a table while retaining the earliest entry.

SQLmediumpartition0.3 min read
BCG
β†’
402

Write a query to retain only the latest record and delete others in case of duplicates.

SQLmediumpartition0.3 min read
Fossil Group
β†’
403

Write a query to select the latest record based on a time_of_insertion column.

SQLmediumpartitionsnowflakesql0.3 min read
Fossil Group
β†’
404

Write a self join query to get the manager's name for each employee.

SQLmediumjoin0.2 min read
Gartner
β†’
405

Write an SQL query to find the top 3 performing products in each category

SQLmediumpartitionsqlwindow0.3 min read
Kagina
β†’
406

Write code to find the third-highest salary in a dataset using Pandas.

SQLmediumspark0.2 min read
Chryselys
β†’
407

Write optimized SQL queries involving window functions, CTEs, and joins.

SQLmediumjoinpartitionsql0.3 min read
Apple
β†’
408

Write queries combining Joins and Group By operations.

SQLmediumjoin0.3 min read
Expedia
β†’
409

Your Kafka consumer shows significant lag during peak hours. What strategies would you employ to reduce lag and ensure timely data processing?

SQLmediumpartition0.4 min read
Dunnhumby
β†’
410

map() vs mapPartitions(): Highlight the difference between map (row-level transformation) and mapPartitions (partition-level transformation).

SQLmediumpartition0.3 min read
Capgemini
β†’
411

repartition() vs coalesce(): Explain when to use repartition() (increases partitions) vs coalesce() (reduces partitions).

SQLmediumpartition0.3 min read
Capgemini
β†’
412

Accumulators - use as shared variable for write-only operations

Spark/Big Datamediumpartition0.2 min read
Nihilent
β†’
413

Broadcast Joins and Shuffle Merge Joins?

Spark/Big Datamediumjoinsparksql0.5 min read
Snowflake
β†’
414

Broadcast join - how it optimizes joins

Spark/Big Datamediumjoinpartitionspark0.4 min read
Nihilent
β†’
415

Can you explain the concept of mappers in Spark, and how are they used in data transformations?

Spark/Big Datamediumpartitionspark0.5 min read
Infosys
β†’
416

Code a simple PySpark job to read a JSON file, filter records, and write output in Parquet format.

Spark/Big Datamediumpartitionpythonspark0.5 min read
American Express
β†’
417

Compare Spark's lineage recovery with Hadoop's block replication mechanism.

Spark/Big Datamediumpartitionspark0.5 min read
Impetus
β†’
418

Daily tasks of a Data Engineer?

Spark/Big Datamediumpartition0.3 min read
Comcast
β†’
419

Data-Related Issues Encountered - handling skewed data

Spark/Big Datamediumpartitionspark0.4 min read
Lumiq
β†’
420

Describe how you would use PySpark to aggregate and summarize large transaction datasets.

Spark/Big Datamediumpartitionsparkwindow0.3 min read
Swiggy
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous1...192021222324Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer