Interview questions
Preparing for a data engineering interview at Citi? This page contains 39 real interview questions sourced from verified Citi interview experiences. Questions are sorted by frequency — the ones asked most often appear first.
Citi data engineering interviews typically focus on Spark/Big Data, General/Other, and SQL. There's a solid mix of fundamental and advanced questions, making it accessible for candidates at multiple experience levels.
Use the difficulty filters above to focus your preparation. For each question, attempt your own answer first, then compare with our expert solution. You can also practice these questions in our AI Mock Interview Coach for real-time feedback.
What is the difference between repartition and coalesce in Apache Spark?
What is the difference between SparkSession and SparkContext in Spark?
What is the difference between partitioning and bucketing in Spark, and when would you use bucketing?
What strategies can you use to handle skewed data in Spark?
What is the difference between Managed and External tables in Hive/Spark?
What is a window function? Explain with an example.
Explain the concept of checkpointing in Spark and why it is important.
Agile methodologies used?
An existing job running longer suddenly: how to analyze the issue?
How is Oozie called?
Oozie workflow files (how many used)?
Shell commands for renaming a file?
Shell: change permissions?
Shell: command to check processes running in the background?
Using shell, how to find the difference between two files?
What type of wrapper is used, or which language is used?
Amazon Deequ usage and what sort of quality checks are done using it?
Given 1TB of a file, how to check word count?
Shell: how to run jobs/scripts in the background?
How to view Oozie jobs?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.