Data engineering interview questions · easy
Which data structure occupies more memory: list or tuple? Why?
Write Java code to read a file using FileInputStream
Write Python code to print even numbers from a list.
Write Python program to find consecutive numbers in a list.
Write a Java program using FileInputStream and BufferedReader to read data from a local file and print the output to the console
Write a Python code that determines if all the people in their seats can see the screen in the theatre.
Write a Python function to reverse all strings in a list.
Write a Python program to remove duplicate elements from a list while preserving the original order
Write a Python script to process raw JSON files containing sales data and load them into a relational database.
Write a Scala code to print prime numbers.
Write a Singleton class implementation
Write a decorator function to log the execution time of a function.
Write a function to find the longest palindromic substring in a given string.
Write a function to remove invalid parentheses from a string.
Write a higher-order function to filter values greater than a threshold in a list.
Write a script to automate daily ingestion of data from an API into a data lake.
Write a simple service and controller class in Spring Boot for REST API
Write a solution to efficiently search a rotated sorted array.
Write a swap function without if-else.
Write code for character frequency in a text file.
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
Data engineering Python rounds focus on: PySpark DataFrame operations, pandas data manipulation, file I/O and JSON/CSV parsing, API integrations, basic algorithms and data structures, error handling patterns, and writing Airflow DAGs or pipeline code.
Generally yes. Data engineering Python rounds rarely include LeetCode-hard algorithm problems. Instead, they test practical data manipulation, PySpark operations, and pipeline-oriented code. However, some FAANG companies still include a standard coding round.
Learn both. PySpark is tested for distributed processing scenarios (large datasets, Spark cluster operations). Pandas is tested for smaller-scale data manipulation and analysis. Most interviewers expect fluency in both, with PySpark being more critical for senior roles.