Real questions from top companies · medium
What is the difference between repartition and coalesce in Apache Spark?
Write an SQL query to find the second-highest salary from an employee table.
What is the difference between cache() and persist() in Spark? When would you use each?
What is the difference between groupByKey and reduceByKey in Spark?
What is the difference between narrow and wide transformations in Apache Spark? Explain with examples.
Demonstrate the difference between DENSE_RANK() and RANK()
Discuss differences between ROW_NUMBER(), RANK(), and DENSE_RANK(), and provide examples from your projects.
Explain the differences between Data Warehouse, Data Lake, and Delta Lake
Explain the differences between Repartition and Coalesce. When would you use each?
What is the difference between partitioning and bucketing in Spark, and when would you use bucketing?
What strategies can you use to handle skewed data in Spark?
Can you explain the difference between OLTP and OLAP?
Describe a time when you had to optimize a slow SQL query. What steps did you take?
Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
How do you handle NULL values in SQL? Mention functions like COALESCE and NULLIF.
What is the difference between WHERE and HAVING clauses in SQL?
Write a Python function to check if a string is a palindrome.
Describe a scenario where partitioning and bucketing would improve query performance.
Explain the types of triggers in ADF, including schedule, tumbling window, and event-based triggers.
How do you remove duplicate rows in BigQuery?