Real questions from top companies
What strategies can you use to handle skewed data in Spark?
Briefly introduce yourself and walk us through your journey as a Data Engineer so far.
Can you explain the difference between OLTP and OLAP?
Describe a time when you had to optimize a slow SQL query. What steps did you take?
Explain the concept of ACID properties in the context of databases.
Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
How do you handle NULL values in SQL? Mention functions like COALESCE and NULLIF.
What is a Common Table Expression (CTE), and when would you use it?
What is the difference between a primary key and a unique key?
What is the difference between WHERE and HAVING clauses in SQL?
Write a Python function to check if a string is a palindrome.
Describe a scenario where partitioning and bucketing would improve query performance.
Explain Fact and Dimension Tables with examples.
Explain the types of triggers in ADF, including schedule, tumbling window, and event-based triggers.
How do you remove duplicate rows in BigQuery?
Joins and window functions - INNER, LEFT, RIGHT, FULL OUTER, ROW_NUMBER(), RANK(), DENSE_RANK()
When would you choose a Snowflake schema over a Star schema?
Can you explain the architecture of Apache Spark and its components?
Describe the difference between Spark RDDs, DataFrames, and Datasets.
Explain the difference between Spark's map() and flatMap() transformations.