Interview Questions

Real questions from top companies · medium

700+ Easy450+ Medium650+ Hard

All Categories Behavioral Spark/Big Data SQL Python/Coding System Design/Architecture Cloud/Tools General/Othereasy medium hard

401

Write a query to remove duplicate records from a table while retaining the earliest entry.

SQLmediumpartition0.3 min read

BCG

→

402

Write a query to retain only the latest record and delete others in case of duplicates.

SQLmediumpartition0.3 min read

Fossil Group

→

403

Write a query to select the latest record based on a time_of_insertion column.

SQLmediumpartitionsnowflakesql0.3 min read

Fossil Group

→

404

Write a self join query to get the manager's name for each employee.

SQLmediumjoin0.2 min read

Gartner

→

405

Write an SQL query to find the top 3 performing products in each category

SQLmediumpartitionsqlwindow0.3 min read

Kagina

→

406

Write code to find the third-highest salary in a dataset using Pandas.

SQLmediumspark0.2 min read

Chryselys

→

407

Write optimized SQL queries involving window functions, CTEs, and joins.

SQLmediumjoinpartitionsql0.3 min read

Apple

→

408

Write queries combining Joins and Group By operations.

SQLmediumjoin0.3 min read

Expedia

→

409

Your Kafka consumer shows significant lag during peak hours. What strategies would you employ to reduce lag and ensure timely data processing?

SQLmediumpartition0.4 min read

Dunnhumby

→

410

map() vs mapPartitions(): Highlight the difference between map (row-level transformation) and mapPartitions (partition-level transformation).

SQLmediumpartition0.3 min read

Capgemini

→

411

repartition() vs coalesce(): Explain when to use repartition() (increases partitions) vs coalesce() (reduces partitions).

SQLmediumpartition0.3 min read

Capgemini

→

412

Accumulators - use as shared variable for write-only operations

Spark/Big Datamediumpartition0.2 min read

Nihilent

→

413

Broadcast Joins and Shuffle Merge Joins?

Spark/Big Datamediumjoinsparksql0.5 min read

Snowflake

→

414

Broadcast join - how it optimizes joins

Spark/Big Datamediumjoinpartitionspark0.4 min read

Nihilent

→

415

Can you explain the concept of mappers in Spark, and how are they used in data transformations?

Spark/Big Datamediumpartitionspark0.5 min read

Infosys

→

416

Code a simple PySpark job to read a JSON file, filter records, and write output in Parquet format.

Spark/Big Datamediumpartitionpythonspark0.5 min read

American Express

→

417

Compare Spark's lineage recovery with Hadoop's block replication mechanism.

Spark/Big Datamediumpartitionspark0.5 min read

Impetus

→

418

Daily tasks of a Data Engineer?

Spark/Big Datamediumpartition0.3 min read

Comcast

→

419

Data-Related Issues Encountered - handling skewed data

Spark/Big Datamediumpartitionspark0.4 min read

Lumiq

→

420

Describe how you would use PySpark to aggregate and summarize large transaction datasets.

Spark/Big Datamediumpartitionsparkwindow0.3 min read

Swiggy

→

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

Previous 1...19 20 21 22 23 24 Next