Interview questions · hard
What is the difference between SparkSession and SparkContext in Spark?
Explain Fact and Dimension Tables with examples.
How do you handle late-arriving data in Spark Structured Streaming?
What is the small-file problem in Spark, and how do you solve it?
Why are you leaving your current company?
What are the key components of AWS Glue, and how do they work together?
What is Snowflake's architecture, and why is it unique?
What is the difference between S3 and HDFS?
Difference Between Internal and External Tables in BigQuery
How do you optimize a long-running SQL query?
Design a Delta table layout for mixed workload: point lookups by user_id, range scans by date, and full partition scans. Compare partitioning vs. Z-ordering—when to use each, and the rewrite cost trade-off.
What is the small-file problem in Spark, and how do you solve it?
Scenario: Query optimization for a large dataset.
Explain PySpark's Catalyst Optimizer.
Explain caching techniques in Databricks.
What is the difference between Lazy Evaluation and Eager Execution in PySpark?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.