Real questions from top companies Β· hard
Snowflake: Types of Caching, Time Travel vs. Fail-safe, Snowpipe, Materialized Views
Spark optimizations: Partitioning, caching, tuning parallelism
Stored Procedure Optimization
Time and cost comparisons for executing the same query in Snowflake and Spark.
What are the benefits of BigQuery Warehouse?
What are the benefits of using a cloud data warehouse (e.g., Redshift, Snowflake) for analytics?
What are the key design principles for a cloud-based data warehouse?
What are the trade-offs between relational databases and NoSQL for financial data?
What considerations are important when designing a dimensional model for a ridesharing app?
What is CTE in SQL?
What is a Data Warehouse, and can you explain its Tier-1 and Tier-2 architecture?
What optimizations would you apply for partitioning strategies?
What strategies and technologies would you consider when designing a data warehouse architecture for efficient data storage and retrieval?
What technologies are you most comfortable with?
What's the role of surrogate keys in dimensional modeling?
Write a query to generate the specified output using advanced SQL skills with joins, aggregations, and window functions.
You need to create a workflow where Task B runs only if Task A is successful, and Task C should always run regardless of Task A or B's status. How would you define this dependency using Airflow?
You need to design a Kafka topic for a logging service. How would you decide the number of partitions and the key for partitioning to balance throughput and ordering requirements?
A data pipeline processes files for different clients stored in separate directories. Explain how you would use dynamic DAG creation to handle client-specific workflows in Airflow.
Adaptive Query Execution (AQE): Discuss how AQE optimizes query execution in Spark dynamically based on runtime stats.
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.