Citi Data Engineer Interview Questions

Interview questions

Easy

Medium

Hard

Preparing for a data engineering interview at Citi? This page contains 20 real interview questions sourced from verified Citi interview experiences. Questions are sorted by frequency — the ones asked most often appear first.

Citi data engineering interviews typically focus on Spark/Big Data, SQL, and General/Other. The interview bar skews toward harder problems (10 hard vs. 3 easy), suggesting emphasis on depth and system-level thinking.

Use the difficulty filters above to focus your preparation. For each question, attempt your own answer first, then compare with our expert solution. You can also practice these questions in our AI Mock Interview Coach for real-time feedback.

Topics Covered

Spark/Big Data SQL General/Other System Design/Architecture Python/Coding

What is the difference between repartition and coalesce in Apache Spark?

Spark/Big Datamediumpartitionpythonspark1 min read

BCGCitiDunnhumbyFragma Data Systems+3

→

What is the difference between SparkSession and SparkContext in Spark?

Spark/Big Datahardoptimizationpythonspark0.7 min read

AltimetrikAmerican ExpressCitiHexaware+3

→

What is the difference between partitioning and bucketing in Spark, and when would you use bucketing?

SQLmediumjoinpartitionspark2 min read

CitiCoforgeHCLLTIMindtree

→

What strategies can you use to handle skewed data in Spark?

Spark/Big Datamediumjoinpartitionspark0.5 min read

BCGBitwiseCitiHashedIn

→

What is the difference between Managed and External tables in Hive/Spark?

Spark/Big Dataeasyspark2 min read

CitiDunnhumbyFragma Data Systems

→

What is a window function? Explain with an example.

SQLmediumjoinpartitionwindow0.5 min read

CitiFreecharge

→

Explain the concept of checkpointing in Spark and why it is important.

Spark/Big Datahardspark0.7 min read

CitiGlobant

→

Shell commands for renaming a file?

General/Othermediumwindow2 min read

Citi

→

Shell: change permissions?

General/Othereasy2 min read

Citi

→

Shell: command to check processes running in the background?

General/Othereasypython2 min read

Citi

→

Given 1TB of a file, how to check word count?

Python/Codinghardpartitionspark2 min read

Citi

→

Teradata to Hadoop migration and handling data with SCD Type 2?

SQLmediumjoinpartitionspark0.7 min read

Citi

→

What is a Kafka topic, and how do you choose the number of partitions for it?

SQLmediumpartition0.5 min read

Citi

→

Explain the concept of consumer groups in Kafka. How do they affect message processing?

Spark/Big Datahardoptimizationpartition0.5 min read

Citi

→

Explain the difference between TriggerDagRunOperator and ExternalTaskSensor in Airflow.

Spark/Big Datahardairflowoptimizationpartition0.5 min read

Citi

→

How would you design a Kafka-based pipeline for processing streaming data in real-time?

Spark/Big Datahardoptimizationpartitionspark2.5 min read

Citi

→

Usage of UDFs?

Spark/Big Datahardoptimizationpythonsql0.6 min read

Citi

→

Describe an end-to-end data pipeline project you worked on, highlighting your role and the technologies used.

System Design/Architecturehardairflowoptimizationpartition4 min read

Citi

→

Describe how Kafka ensures data durability and fault tolerance.

System Design/Architecturehardoptimizationpartitionspark4.1 min read

Citi

→

Introduce your recent project, explaining its goal, architecture, tools, and technologies.

System Design/Architecturehardjoinpartitionspark2.3 min read

Citi

→

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

One-time download

Take the Citi answers offline

The Data Engineering Interview Answer Vault bundles 750+ reviewed answers into 7 focused PDF volumes — SQL, Spark, Python, System Design, Cloud, Behavioral, and Data Modeling. Study on any device, no subscription required.

$21/ ₹499

Get the Answer Vault →

Level up your prep

Recommended

Educative

Educative Unlimited

800+ hands-on courses — Grokking System Design, Coding Patterns, and AI mock interviews for your DE loop.

Start learning →

Fenzo

Fenzo AI

Turn any topic or your own notes into an interactive, personalized course in 60 seconds.

Try it free →

Book · Martin Kleppmann

Designing Data-Intensive Applications

The book that gets data engineers through system-design rounds. Essential reading.

Get the book →

Some links below are affiliate links. If you buy through them we may earn a small commission at no extra cost to you — it helps keep DataEngPrep free.

Other Companies

Altimetrik Chryselys Fossil Group Matrix Meesho Nagarro BCG Dunnhumby