Interview questions · medium
What is the difference between groupByKey and reduceByKey in Spark?
Demonstrate the difference between DENSE_RANK() and RANK()
Write a Python function to check if a string is a palindrome.
Explain how using a staging area in S3 can help.
Explain the role of Glue Catalog in Athena.
Explain using AWS Glue for ETL. What challenges might you face with large datasets?
How do you ensure message ordering in Kinesis Streams?
What alternatives to Kinesis would you consider for real-time data ingestion?
What integration challenges might you face with Glue Catalog in non-AWS environments?
How would you implement custom alarms for data delays or job failures?
How would you monitor and reduce disk-based queries (disk spilling)?
What are the benefits of the COPY command's MANIFEST option?
How would you decide between using DISTKEY and SORTKEY?
Compare Glue partition discovery with Hive MSCK/ADD PARTITION. Explain the operational and cost implications of crawler-based vs. partition-projection approaches. When does partition projection become necessary, and what are its limitations?
Explain how you would optimize Redshift query performance for a reporting system with large fact tables.
Explain the differences between table re-creation and ALTER TABLE operations.
How does partitioning in S3 affect Athena query performance?
How would you handle data type changes for an existing column?
How would you prevent small file problems in S3 when loading data into Redshift?
What metrics would trigger an auto-scaling event?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.