Medium-level python questions from real data engineering interviews.
These medium python questions are selected from real interviews at top companies. Each question includes a detailed expert answer and pro tip to help you nail your interview. This set leans toward the medium-difficulty band most real interviews actually live in (24 of 24). Recurring themes are join, partition, and python — these patterns appear most often in real interviews and reward the deepest preparation. These questions have been reported across 21 companies including Capco and American Express. Average answer is around 1 minute of reading — plan roughly 1 hour to work through the full set thoughtfully.
This collection contains 24 curated questions: 0 easy, 24 medium. The balanced mix of difficulties makes this set suitable for engineers at any career stage.
The most frequently tested areas in this set are join (15), partition (12), python (9), spark (6), sql (3), and snowflake (2). Focusing on these topics will give you the highest return on your preparation time.
Medium-difficulty questions form the bulk of real interviews — spend the most time here and practice explaining your reasoning out loud. For each question, try answering before revealing the solution. Use our AI Mock Interview to simulate real interview conditions and get instant feedback on your responses.
Write a Python function to check if a string is a palindrome.
Write a Python function to check if a string is a palindrome.
Create a Python program to demonstrate the use of set operations (union, intersection).
Describe Spark's memory management model. How do you handle heap memory overhead issues?
Differentiate SORT BY, ORDER BY, DISTRIBUTE BY, and CLUSTER BY
Extended the solution to determine the nth largest element in an array.
GeoPandas - definition and features
Grouping and aggregation functions?
How many cities does each department operate in? List the top 3 departments in terms of the most number of cities. In case of a tie, order by dept_id.
How would you decide between using DISTKEY and SORTKEY?
Implement an algorithm to find the longest common prefix among an array of strings.
List customers with more than 5 orders.
List every combination of dept_name, employee_name, and city such that the employee belongs to the department and the same city in which the department is located.
Multithreading and Synchronization in Java - write code to manage synchronized threads
Replace words and perform string operations in Python (replace, vowel removal, word count, pattern check).
Reverse a string with special characters preserved.
Sort and merge arrays
Spark Coding: Using explode() Function to flatten nested arrays
Stuff Function for XML Usages
What role does the executor heap size play in preventing OOM errors?
Write Python code to remove duplicates from a string.
Write a Python program to calculate total spending, identify top 5 users by spending, and find the most purchased product
Write a Python program to reverse words in a string.
Write a function that replaces all characters in a list except for a given character
Get full access to 1,800+ expert answers, AI mock interviews, and personalized progress tracking.