Interview questions · hard
Design an end-to-end data pipeline using Glue, Lambda, EC2, S3, Redshift, and Athena.
Time and cost comparisons for executing the same query in Snowflake and Spark.
Write a query to generate the specified output using advanced SQL skills with joins, aggregations, and window functions.
Explain how Spark processes a 500GB file, covering memory allocation, shuffles, and spillovers to disk.
Explain how to overwrite a file stored in S3 using PySpark.
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.