Real questions from top companies
Write a query to get the latest rule_id and rule_status.
Write a query to get the names of all employees who are managers with five or more direct reports.
Write a query to identify duplicate customer entries based on email and phone number.
Write a query to identify unique user sessions.
Write a query to remove duplicate records from a table while retaining the earliest entry.
Write a query to retain only the latest record and delete others in case of duplicates.
Write a query to select the latest record based on a time_of_insertion column.
Write a query to switch values in the Gender column (M to F and F to M).
Write a self join query to get the manager's name for each employee.
Write an SQL query to find the top 3 performing products in each category
Write code to find the third-highest salary in a dataset using Pandas.
Write optimized SQL queries involving window functions, CTEs, and joins.
Write queries combining Joins and Group By operations.
You need to create a workflow where Task B runs only if Task A is successful, and Task C should always run regardless of Task A or B's status. How would you define this dependency using Airflow?
You need to design a Kafka topic for a logging service. How would you decide the number of partitions and the key for partitioning to balance throughput and ordering requirements?
Your Kafka consumer shows significant lag during peak hours. What strategies would you employ to reduce lag and ensure timely data processing?
map() vs mapPartitions(): Highlight the difference between map (row-level transformation) and mapPartitions (partition-level transformation).
repartition() vs coalesce(): Explain when to use repartition() (increases partitions) vs coalesce() (reduces partitions).
A JSON file with evolving schema needs to be ingested into a DataFrame. How would you handle new fields dynamically in PySpark without breaking the job for previous structures?
A data pipeline processes files for different clients stored in separate directories. Explain how you would use dynamic DAG creation to handle client-specific workflows in Airflow.
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.