Interview questions
Optimize a query fetching customer data with a rolling 6-month sales sum.
What is your notice period, and are you interviewing elsewhere?
What optimizations would you apply for partitioning strategies?
What technologies are you most comfortable with?
Write a SQL query to find employees earning the second-highest salary.
Write a SQL query to find the top 5 products by sales per region.
Describe your approach to managing offsets in Kafka.
Explain how you would design a partition strategy for a large dataset in HDFS.
Explain the architecture of Kafka and its core components.
Explain your choice of streaming framework (Kafka, Spark Streaming, etc.).
How do you handle out-of-memory errors in Spark jobs?
How do you reduce shuffle operations in Spark?
How does Kafka ensure message durability and reliability?
How does Spark execute a job? Explain the DAG and stages.
How does lazy evaluation work in Spark?
Implement a Kafka consumer that writes streaming data into a database.
Implement a PySpark job to read CSV data, perform joins, and store output as partitioned Parquet.
What are the different delivery semantics in Kafka (at least-once, at-most-once, exactly-once)?
What is the role of Zookeeper in Kafka?
Write a PySpark code snippet to filter rows with a specific condition.
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.