Interview questions · hard
Tell me about yourself and your experience.
Design a fault-tolerant Spark Streaming checkpoint strategy: what to persist, recovery semantics, and cost/scalability trade-offs with checkpoint frequency.
Given a streaming dataset from Kafka, how would you ingest the data in real-time using Spark?
Describe the ZS projects you worked on
Can you explain the concept of polymorphism and inheritance in Java with examples?
Design a Custom API that can query a backend server and return customer data such as the number of orders placed by a user based on their user ID
Design the data model for an ETL pipeline that ingests data from a database and loads it into Snowflake
After cleaning, how would you store the transformed data into Delta Lake?
Compare Kafka Streams and Spark Structured Streaming for real-time processing
Design an ETL pipeline using Kafka and Spark Streaming
Explain how spark.read.format("delta").load() works
Explain the architecture and role of the Hive Metastore in a data pipeline
Explain the architecture of Kafka
Explain the architecture of Spark Streaming
Handling Skewness in Data - salting, broadcast join
Have you worked with data compaction in Delta Lake?
How do you store streaming data in Delta Lake and handle schema evolution?
How does Databricks create clusters for running Spark jobs?
How does Delta Lake store the transaction history in S3 buckets?
How to optimize mappers using properties in MapReduce?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.