Medium-level general questions from real data engineering interviews.
These medium general questions are selected from real interviews at top companies. Each question includes a detailed expert answer and pro tip to help you nail your interview.
How would you read data from a web API? What steps would you follow after reading the data?
What is the difference between SQL and NoSQL databases?
APPLY Operator - CROSS APPLY and OUTER APPLY
An existing job running longer suddenly: how to analyze the issue?
Calculate a 7-day moving average of clicks for each user_id
Calculate a 7-day moving average of orders for each city in the Swiggy database.
Calculate cumulative sales for each product in each store, ordered by sale_date
Calculate the total number of transactions (units sold) for each product.
Calculate the total sales amount for customers born between 1998-01-15 and 2000-01-15.
Compute the moving average of daily transactions over a 7-day window.
Data Shuffling Causes and Techniques
Describe a time when you had to deal with a major data quality issue. How did you handle it?
Describe the concept of data sharding and when to use it.
Describe your approach to managing data deduplication.
Discuss Primary, Foreign, and Composite Keys.
Discuss the average data volume handled and strategies used for efficient processing.
Explain how you would implement a caching mechanism for frequently accessed video metadata.
Extract insights from given JSON data using your preferred framework.
Fetch the rows with the highest scores for each student in a year.
Find All Numbers that Appear at Least Three Times Consecutively
Find the names of managers who have at least 7 employees directly reporting to them.
Find top 3 products sold based on total quantity.
Given exchange rates for USD to INR with timestamps: Find the ticket price in rupees for various dates. Use the latest exchange rate based on the timestamp for each date.
Given the data with id, name, and department, how would you calculate how many employees are in each department?
Graph Databases - explain
Handling node failures
How can technology improve private equity investments?
How does Z ORDERING enhance data retrieval performance?
How does cluster size impact parallelism limits?
How would you optimize a slow-running SQL query?
How would you implement custom alarms for data delays or job failures?
Identify the top 5 customers with the highest purchases in the last quarter.
Match countries in a pairwise format
Reverse operation for splitting values back to original format
Shell commands for renaming a file?
Solve Minimum Remove to Make Valid Parentheses.
Steps to Verify Source and Target Data Match After Load
WAQ for Desired Output (Node Parent Relationship)
What are the benefits of the COPY command's MANIFEST option?
What steps do you take to troubleshoot a slow-running Spark job?
What strategies do you use to handle network bottlenecks?
What would you do if a job misses its SLA? How would you handle such situations?
What would you do if the files are stored in multiple folders with varying retention policies?
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.