Spark & Big Data questions from Meesho data engineering interviews.
These spark & big data questions are sourced from Meesho data engineering interviews. Each includes an expert-level answer.
Design a fault-tolerant Spark Streaming checkpoint strategy: what to persist, recovery semantics, and cost/scalability trade-offs with checkpoint frequency.
Given a streaming dataset from Kafka, how would you ingest the data in real-time using Spark?
After cleaning, how would you store the transformed data into Delta Lake?
Compare Kafka Streams and Spark Structured Streaming for real-time processing
Databricks Cluster Management - standalone vs YARN mode
Design an ETL pipeline using Kafka and Spark Streaming
Explain how spark.read.format("delta").load() works
Explain the architecture and role of the Hive Metastore in a data pipeline
Explain the architecture of Kafka
Explain the architecture of Spark Streaming
Handling Skewness in Data - salting, broadcast join
Have you worked with data compaction in Delta Lake?
How do you store streaming data in Delta Lake and handle schema evolution?
How does Databricks create clusters for running Spark jobs?
How does Delta Lake store the transaction history in S3 buckets?
How to optimize mappers using properties in MapReduce?
How would you ensure exactly-once processing for Kafka consumers in your Spark job?
How would you handle memory management in Spark?
How would you manage the streaming data schema and handle schema evolution in Delta Lake?
How would you optimize your Spark Streaming ETL pipeline for high throughput and low latency?
Spark Executor Management: 10 workers, 100GB RAM, 25 cores - number of executors, size, OOM in Driver
Sqoop command for importing multiple tables
Suppose you need to import 5 tables from an external RDBMS (like MySQL) into Hadoop HDFS. Write the Sqoop command
What role would Kafka or similar event-driven platforms play in your architecture?
What strategies would you use to optimize Spark jobs for both performance and cost on AWS?
You are given 10 worker machines with 100 GB RAM and 25 CPU cores. How would you determine the number of executors and the size of each executor?
Download the complete interview prep bundle with expert answers. Study offline, on your commute, anywhere.