DataEngPrep.tech
QuestionsBlogStore
Get PDF Bundle

Interview Questions

Real questions from top companies · hard

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
1

Tell me about yourself and your experience.

Behavioralhardjoinpartition0.7 min read
AltimetrikChryselysFossil GroupGlobant+5
→
2

What is the difference between SparkSession and SparkContext in Spark?

Spark/Big Datahardoptimizationpythonspark0.7 min read
AltimetrikAmerican ExpressCitiHexaware+3
→
3

What architecture are you following in your current project, and why?

System Design/Architecturehardairflowetljoin3.5 min read
CognizantHCLNagarroThoughtworks+1
→
4

Briefly introduce yourself and walk us through your journey as a Data Engineer so far.

Behavioralhardetljoinpartition0.5 min read
AccentureEPAMYash Technologies
→
5

What is a Common Table Expression (CTE), and when would you use it?

SQLhardbigqueryoptimizationsnowflake0.4 min read
AccentureCognizantEPAMYash Technologies
→
6

What is the difference between a primary key and a unique key?

SQLhardsparksql0.4 min read
AccentureCognizantEPAMYash Technologies
→
7

Explain Fact and Dimension Tables with examples.

SQLhardjoin0.6 min read
DatameticaDeloitteIncedo
→
8

Joins and window functions - INNER, LEFT, RIGHT, FULL OUTER, ROW_NUMBER(), RANK(), DENSE_RANK()

SQLhardjoinpartitionwindow0.7 min read
FordKPMGNihilent
→
9

Can you explain the architecture of Apache Spark and its components?

Spark/Big Datahardjoinoptimizationpartition3.2 min read
CoforgeFreechargeNihilent
→
10

Describe the difference between Spark RDDs, DataFrames, and Datasets.

Spark/Big Datahardoptimizationpartitionspark0.5 min read
AccentureFragma Data Systems
→
11

How does Spark's Catalyst Optimizer work? Explain its stages.

Spark/Big Datahardjoinoptimizationspark0.5 min read
DunnhumbyFragma Data SystemsHashedIn
→
12

How do you handle late-arriving data in Spark Structured Streaming?

Spark/Big Datahardsparkwindow0.5 min read
BitwiseIncedoSwiggy
→
13

What is the small-file problem in Spark, and how do you solve it?

Spark/Big Datahardpartitionspark0.5 min read
Daniel WellingtonIncedoSwiggy
→
14

How do you optimize Spark jobs for better performance? Mention at least 5 techniques.

Spark/Big Datahardjoinoptimizationpartition0.5 min read
Fragma Data SystemsPresidioSwiggy
→
15

How do you handle conflicts within a team? Provide an example.

Behavioralhard0.7 min read
EPAMJIO
→
16

Why are you leaving your current company?

Behavioralhard0.6 min read
AareteIncedo
→
17

What are the key components of AWS Glue, and how do they work together?

Cloud/Toolshardetlspark0.6 min read
EYIncedoTech Mahindra
→
18

What is Snowflake's architecture, and why is it unique?

Cloud/Toolshardbigqueryjoinoptimization3 min read
EYIncedoTech Mahindra
→
19

What is the difference between S3 and HDFS?

Cloud/Toolshard0.6 min read
EYIncedoTech Mahindra
→
20

Briefly explain the architecture of Kafka.

System Design/Architecturehardjoinoptimizationpartition3 min read
Delivery HeroGrover
→
123...34Next