Real interview questions on Snowflake-warehouses, materialized views, clustering, time travel, and cloud data platform architecture.
Snowflake has become a dominant cloud data warehouse. These questions cover Snowflake-specific concepts: virtual warehouses, compute scaling, materialized views vs streams, clustering keys, time travel, data sharing, and query optimization. Prepare for Snowflake-focused roles at enterprises and tech companies.
What is the difference between repartition and coalesce in Apache Spark?
CDC During Migration - explain approaches for real-time Change Data Capture
Explain the differences between a Data Lake and a Data Warehouse.
Explain the differences between Data Warehouse, Data Lake, and Delta Lake
Can you explain the difference between OLTP and OLAP?
What is a Common Table Expression (CTE), and when would you use it?
What is the difference between a primary key and a unique key?
When would you choose a Snowflake schema over a Star schema?
Describe the data pipeline architecture you've worked with.
Difference Between Internal and External Tables in BigQuery
Have you worked on Data Warehousing projects?
Prioritize Spark optimizations by impact and effort. Discuss partitioning strategy, caching policy, join selection, shuffle reduction, and when each becomes a scalability or cost bottleneck.
Retrieve the most recent sale_timestamp for each product (Latest Transaction).
Walk through the three AQE features in Spark 3.x (coalesce, join switch, skew join)—how they operate at shuffle boundaries, which configs enable them, and what happens when AQE cannot help.
What are primary keys and foreign keys? Why are they important?
What is a CTE (Common Table Expression)? What are its uses?
What is Adaptive Query Execution (AQE) in Spark 3.x, and how does it improve performance?
What is Snowflake's architecture, and why is it unique?
What is the difference between a view and a materialized view?
What is the difference between Managed and External Tables in Databricks?
What is the difference between OLTP and OLAP?
What is the difference between OLTP and OLAP?
What is the most difficult task you've ever worked on?
Why should we hire you for this role?
Airflow operators, hooks, and scheduler functionality?
API calling with Airflow?
Approaches to handling multiple tasks within a sprint?
Briefly introduce yourself and walk us through your journey as a Data Engineer so far.
Broadcast Joins and Shuffle Merge Joins?
Build an executive dashboard for reporting.
Building ETL pipelines to capture changes when new records are inserted into source tables?
Business Role of Data Pipeline
Cache vs. Persistent storage in Spark?
Can Presto work with Near Real-Time Data (Streaming Data Source)?
Can you describe a situation where you had to work with a difficult stakeholder? How did you manage the situation and what was the outcome?
Can you provide a use case where Assert Transformations helped maintain data quality?
Challenges faced in translating requirements into technical solutions?
Compare Native vs Cloud Database Systems.
Compare OLTP and OLAP systems in the context of financial transactions.
Compare PostgreSQL vs Snowflake. How do they handle duplicate record errors?
Compare Redshift, BigQuery, and Snowflake in terms of cost, performance, and scalability.
Compare the star schema and snowflake schema. Which one would you use for reporting at Swiggy, and why?
Core services of AWS used in data engineering?
Could you describe a specific cost optimization strategy you implemented in the cloud and its results?
Count occurrences of each character in a string
Create partitioned table
Data Warehouse Design from scratch
Database vs Data Warehouse vs Data Mart vs Data Lake
Databricks Job Cluster and SQL Endpoint - discuss Photon
Delete vs. Truncate in Snowflake?
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.