Real questions from top companies Β· medium
What is a self-join, and when would you use it?
What is normalization and denormalization? When would you use each?
What is the difference between a view and a materialized view?
Write an SQL query to find duplicate emails in a users table.
Triggers in ADF, especially tumbling window triggers.
What is a window function? Explain with an example.
What is the difference between OLTP and OLAP?
Write a SQL query to find top 3 earners in each department.
Write a query to find the top three highest-paid employees in each department using window functions.
Write complex SQL queries involving multiple joins, subqueries, and data aggregation logic.
Convert complex SQL (CTEs, window functions, subqueries) to production-grade PySpark. Discuss when to use spark.sql() vs. DataFrame API, and the implications for testability, partitioning, and execution predictability.
Explain how Adaptive Query Execution changes the economics of Spark tuning. What problems does it solve at runtime, and when might you still need manual intervention (e.g., salting, broadcast hints)?
Architect incremental load in ADF + Databricks with idempotency, late-arrival handling, and cost/scalability implications of watermark vs. change data capture.
Explain strategies for managing schema changes in PySpark over time.
How do you drop columns with null values in PySpark?
How do you handle data skewness in Spark?
How would you read data from a web API using PySpark?
What is Adaptive Query Execution (AQE) in Spark 3.x, and how does it improve performance?
What is the difference between repartition and coalesce in Spark?
When and how do you use Broadcast Join in Spark?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.