Data engineering interview questions
Write a Singleton class implementation
Write a decorator function to log the execution time of a function.
Write a function that replaces all characters in a list except for a given character
Write a function to detect anomalies in streaming data using a sliding window.
Write a function to find the longest palindromic substring in a given string.
Write a function to remove invalid parentheses from a string.
Write a higher-order function to filter values greater than a threshold in a list.
Write a script to automate daily ingestion of data from an API into a data lake.
Write a simple service and controller class in Spring Boot for REST API
Write a solution to efficiently search a rotated sorted array.
Write a swap function without if-else.
Write code for character frequency in a text file.
Write code for palindrome generation.
Write code to calculate the power of a given number in minimum time complexity.
Write code to manually invoke garbage collection in Java
Write code to merge two sorted arrays without using extra space.
Write code using Java's concurrent API (forEach, forEachEntry, forEachKey)
Write pseudo code for an ETL pipeline using Python and Pandas
Zigzag Order Traversal of a Binary Tree
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
Data engineering Python rounds focus on: PySpark DataFrame operations, pandas data manipulation, file I/O and JSON/CSV parsing, API integrations, basic algorithms and data structures, error handling patterns, and writing Airflow DAGs or pipeline code.
Generally yes. Data engineering Python rounds rarely include LeetCode-hard algorithm problems. Instead, they test practical data manipulation, PySpark operations, and pipeline-oriented code. However, some FAANG companies still include a standard coding round.
Learn both. PySpark is tested for distributed processing scenarios (large datasets, Spark cluster operations). Pandas is tested for smaller-scale data manipulation and analysis. Most interviewers expect fluency in both, with PySpark being more critical for senior roles.