The most frequently asked window questions in data engineering interviews.
Master window for your next data engineering interview. These questions cover core concepts, advanced patterns, and real-world scenarios that interviewers test.
Write an SQL query to find the second-highest salary from an employee table.
Demonstrate the difference between DENSE_RANK() and RANK()
Discuss differences between ROW_NUMBER(), RANK(), and DENSE_RANK(), and provide examples from your projects.
Explain the types of triggers in ADF, including schedule, tumbling window, and event-based triggers.
Joins and window functions - INNER, LEFT, RIGHT, FULL OUTER, ROW_NUMBER(), RANK(), DENSE_RANK()
How do you handle late-arriving data in Spark Structured Streaming?
Briefly explain the architecture of Kafka.
Retrieve the most recent sale_timestamp for each product (Latest Transaction).
Explain SQL Window Functions with examples.
Implement a query to find the top 5 customers by total sales amount.
SQL query to find the second highest salary from each department.
Write an SQL query to find duplicate emails in a users table.
Triggers in ADF, especially tumbling window triggers.
What is a window function? Explain with an example.
Write a query to find the top three highest-paid employees in each department using window functions.
Write complex SQL queries involving multiple joins, subqueries, and data aggregation logic.
Convert complex SQL (CTEs, window functions, subqueries) to production-grade PySpark. Discuss when to use spark.sql() vs. DataFrame API, and the implications for testability, partitioning, and execution predictability.
Explain the difference between batch and streaming data processing in Data Fusion.
How would you implement a sliding window aggregation in Spark Structured Streaming?
Implement a Spark job to find the top 10 most frequent words in a large text file.
What is the small-file problem in Spark, and how do you solve it?
Write the PySpark code to find the second highest salary in each department.
Describe a time you had to make a difficult decision with limited information.
Copy Large Files from On-Premises to Azure in ADF
Explain how you debug failed pipelines in ADF.
Explain the key components of Apache Beam in the context of Google Dataflow.
How do you merge data from different sources in ADF while maintaining data quality?
On-Premises to Cloud Integration Runtime
Calculate a 7-day moving average of clicks for each user_id
Calculate a 7-day moving average of orders for each city in the Swiggy database.
Calculate cumulative sales for each product in each store, ordered by sale_date
Compute the moving average of daily transactions over a 7-day window.
Describe your approach to managing data deduplication.
Fetch the rows with the highest scores for each student in a year.
Find orders exceeding $1,000 in the last 30 days.
Given exchange rates for USD to INR with timestamps: Find the ticket price in rupees for various dates. Use the latest exchange rate based on the timestamp for each date.
Given the data with id, name, and department, how would you calculate how many employees are in each department?
How would you monitor and reduce disk-based queries (disk spilling)?
Identify the top 5 customers with the highest purchases in the last quarter.
Implement a rate-limiter to control API requests per user.
Shell commands for renaming a file?
Grouping and aggregation functions?
Multiprocessing in Python - explain with example
Optimize a function to calculate moving averages of user engagement.
TCP Protocol Functionality
What programming languages are you proficient in?
Write a function to detect anomalies in streaming data using a sliding window.
Add Row Numbers using window function in PySpark
Add a new column with the average salary by department.
Describe a scenario where you used Databricks for real-time data processing.
Discuss strategies for handling schema evolution in data warehouses.
Explain the architectural trade-offs when optimizing a query on 100M+ rows: indexing vs. partitioning vs. materialized views. When does each approach become cost-prohibitive or operationally burdensome, and how do you quantify impact?
Explain how to implement cumulative sum in SQL.
Explain the concept of window functions in SQL and provide an example
Explain the process you would follow for optimizing a database query that is running slow.
Explain the purpose of windowing and triggering in streaming data pipelines.
Find Employees with Maximum Salary in Each Department
Find the second-highest salary in the employees table using three different methods.
Given a complex nested query, how would you refactor it for better readability and efficiency?
Given a table of sales data, use window functions to calculate a running total.
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.