Real questions from top companies
How would you handle data type changes for an existing column?
How would you handle duplicate or corrupted data in a batch ETL job?
How would you handle null values in a dataset, especially in a single column?
How would you handle nulls in a SQL join? Provide examples using COALESCE.
How would you identify duplicate records based on a composite key in SQL?
How would you optimize a SQL query for better performance when working with large datasets?
How would you optimize a query fetching sales data across multiple countries with billions of rows?
How would you optimize a query with multiple joins and subqueries?
How would you prevent small file problems in S3 when loading data into Redshift?
How would you retrieve the first and last order for each customer from a sales table?
Identify and remove duplicate records from a table, keeping the most recent record based on a timestamp column.
Identify consecutive numbers in a column (at least 3 consecutive).
If manual partitions are created in a Hive data-warehouse table directory, and you query records from those partitions, will you see the data? If not, how can this be fixed?
Implement a CASE WHEN condition - medium difficulty
In Python, process a large CSV in chunks and remove duplicate records based on email and timestamp.
Indexing - True/False question on indexes and query optimization
Indexing β Types and Benefits?
Indexing: When to Use and Avoid
Integration of Snowflake with external data sources such as S3, GCS, and Blob Storage?
Joins: Different types and their use cases
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.