Question 1

Tell me about yourself and your experience.

Accepted Answer

**Situation**: I joined the data org when our pipelines were monolithic, causing 4+ hour delays and frequent outages affecting downstream dashboards and ML models.

**Task**: I was tasked with redesigning the data platform to support real-time decisioning while improving reliability and cost efficiency.

**Action**: I led a cross-functional team of 5 engineers to architect a medallion (Bronze/Silver/Gold) architecture on Delta Lake....

Question 2

Tell me about your family background

Accepted Answer

**Situation**: Growing up, my family emphasized education and hard work. My parents ran a small business, so I saw firsthand how data and decisions affect outcomes.

**Task**: I learned to connect personal discipline with professional reliability—hitting deadlines, owning failures, and iterating.

**Action**: I channeled that into engineering: building pipelines with clear SLAs, documenting runbooks, and mentoring juniors....

Question 3

Explain the differences between Data Warehouse, Data Lake, and Delta Lake

Accepted Answer

**Data Warehouse**: Structured, schema-on-write; optimized for SQL analytics (Snowflake, BigQuery). High compute cost, fast queries. **Data Lake**: Raw/semi-structured object storage (S3, ADLS); schema-on-read; low cost, flexible. **Delta Lake**: Open-source storage layer on a data lake adding ACID transactions, schema enforcement, time travel, upserts. **Why the distinction**: Warehouses scale compute and storage together; lakes decouple them....

Question 4

Design a fault-tolerant Spark Streaming checkpoint strategy: what to persist, recovery semantics, and cost/scalability trade-offs with checkpoint frequency.

Accepted Answer

**Section 1 — The Context (The 'Why')**
Spark Streaming fault tolerance requires checkpointing state and offsets. Checkpoint corruption loses replay; too-frequent checkpoints add overhead.

**Section 2 — The Diagram**
```
[Source] --> [Stream] --> [Sink]
  Checkpoint:S3  State:RocksDB
```

**Section 3 — Component Logic**
**Checkpoint** stores offsets and metadata to S3/HDFS. On restart, driver replays from last offset. **State store** (RocksDB) backs aggregation state....

Question 5

Given a streaming dataset from Kafka, how would you ingest the data in real-time using Spark?

Accepted Answer

Kafka ingestion with Spark Structured Streaming follows a standard pattern: readStream → parse → writeStream with checkpoint. **Architectural decisions**: (1) **startingOffsets**: 'earliest' for backfill, 'latest' for tail-only; use JSON per-partition offsets for exactly-once replay. (2) **Checkpoint**: Mandatory for exactly-once; stores offsets + write metadata; without it, duplicates or data loss on restart....

Question 6

What are your hobbies or activities you enjoy outside of work?

Accepted Answer

Situation: Personal interest question. Task: Brief, authentic. Action: 'I [hike/run/yoga] to stay active. I experiment with new tools and contribute to open source. I enjoy [photography/cooking]. These help me recharge and bring perspective.' Result: Relatable; can tie to problem-solving.

Question 7

What are your key achievements in your career so far?

Accepted Answer

Situation: Achievement showcase. Task: Quantified impact. Action: '(1) Platform—80% fewer pipeline failures. (2) Migration—50+ pipelines, 40% cost cut. (3) Mentored 3 to promotion. (4) Real-time—hours to minutes....

Question 8

What database would you choose for handling transactional and non-transactional data? Why?

Accepted Answer

Situation: Mixed OLTP and OLAP. Task: Recommend with rationale. Action: Why separate: OLTP needs ACID, low latency; OLAP needs scans, columnar. Polyglot: PostgreSQL/Aurora for OLTP; Snowflake/BigQuery or Delta for OLAP. Sync via CDC. Single-store (CockroachDB) trades off....

Meesho Data Engineer Interview Questions

Reading isn't practice. Get AI feedback on your answers.

Other Companies

Meesho Data Engineer Interview Questions

Reading isn't practice. Get AI feedback on your answers.

Other Companies