Cloud/Tools·16 min read·April 29, 2026

Apache Kafka Interview Questions for Data Engineers: 15 Questions That Actually Get Asked (2026)

Kafka is in every data engineering job description, but most candidates only know 'producers and consumers.' Master these 15 questions covering partitioning strategy, exactly-once semantics, and Kafka Connect patterns.

Key Takeaways

✓Why Kafka Questions Trip Up Even Experienced Engineers
✓Q1: How do you decide the number of partitions for a Kafka topic?
✓Q2: Explain exactly-once semantics in Kafka
✓Q3: What happens during a consumer group rebalance?

Why Kafka Questions Trip Up Even Experienced Engineers

Apache Kafka appears in 70%+ of data engineering job descriptions, yet most candidates can only explain the basics: producers publish messages, consumers read them, topics have partitions. That's a 5-minute tutorial, not interview-ready knowledge.

What separates candidates who get offers: understanding partition strategy trade-offs, exactly-once semantics (and when it actually works), consumer group rebalancing, and Kafka Connect vs custom consumers.

This guide covers 15 real questions from interviews at companies like Netflix, Uber, LinkedIn, and Walmart — with weak answers that get rejected and strong answers that get offers.

Q1: How do you decide the number of partitions for a Kafka topic?

Weak answer: "More partitions = more parallelism. I'd use a high number like 100."

Why it fails: Shows no understanding of the trade-offs. More partitions means more open file handles, longer leader election on broker failure, and higher end-to-end latency.

Strong answer: Partition count is a function of target throughput and consumer parallelism:

Start with throughput math: If each partition can handle 10MB/s and you need 100MB/s, you need at least 10 partitions.
Match consumer count: Maximum parallelism = number of partitions. If you have 20 consumers, you need at least 20 partitions.
Leave room to grow: Increasing partition count is easy (just add more), but decreasing is impossible without recreating the topic.

My production guidelines:

Start with 2x your expected consumer count
Cap at 50 partitions per topic unless throughput demands more
Monitor consumer lag — if it's consistently growing, add partitions + consumers
For key-based partitioning (e.g., by user_id), changing partition count redistributes keys and breaks ordering guarantees. Plan the initial count carefully.

Q2: Explain exactly-once semantics in Kafka

Weak answer: "Kafka supports exactly-once delivery using transactions."

Strong answer: Exactly-once in Kafka has three different scopes — most candidates only know one:

Idempotent producer (enable.idempotence=true): Prevents duplicates within a single producer session. The broker deduplicates using a sequence number per partition. Limitation: Only works within one producer instance. If the producer restarts, duplicates are possible unless you also use transactions.

Transactional producer: Wraps multiple writes (across topics/partitions) in an atomic transaction. Either ALL messages are committed or NONE. Used by Kafka Streams for exactly-once stream processing.

Consumer exactly-once: The hardest part. Kafka itself only guarantees at-least-once delivery to consumers. For exactly-once END-TO-END:
Option A: Transactional producer + read_committed consumers (Kafka Streams does this)
Option B: Idempotent writes on the consumer side (e.g., MERGE INTO with dedup key)

Production reality: Most teams use at-least-once + idempotent consumers because transactional exactly-once adds latency and complexity. Exceptions: financial systems where duplicate processing causes real money problems.

Q3: What happens during a consumer group rebalance?

Weak answer: "Partitions get redistributed among consumers in the group."

Strong answer: A rebalance is triggered when a consumer joins, leaves, or fails a heartbeat. During rebalance:

Stop-the-world (eager rebalance): ALL consumers in the group stop processing. The group coordinator revokes all partition assignments and reassigns them. This causes processing gaps — no messages are consumed during rebalance.

Cooperative rebalance (modern approach): Only the affected partitions are revoked and reassigned. Other consumers keep processing. Enabled with partition.assignment.strategy=cooperative-sticky.

Why this matters in production:

A deployment that rolls 10 consumers sequentially triggers 10 rebalances
Each eager rebalance pauses ALL consumers for seconds
Fix: Use cooperative-sticky assignment + session.timeout.ms=45000 + heartbeat.interval.ms=15000
Better fix: Use static group membership (group.instance.id) — consumers that restart within session.timeout.ms get their old partitions back without triggering a rebalance

Red flag answer: Saying "just increase the number of consumers" without mentioning rebalance overhead.

Q4: Kafka Connect vs custom consumers — when do you use each?

Weak answer: "Kafka Connect is easier, custom consumers give more control."

Strong answer: The decision depends on your sink system and transformation needs:

Use Kafka Connect when:

Sinking to a standard system (S3, HDFS, JDBC, Elasticsearch, BigQuery)
Transformations are simple (field renaming, type conversion, filtering)
You want exactly-once sink delivery (Connect handles offset management)
You don't want to maintain consumer code

Use custom consumers when:

Complex business logic per message (enrichment, API calls, conditional routing)
Custom error handling (dead-letter queues with specific retry policies)
Non-standard sink systems with no existing connector
You need fine-grained control over batching and parallelism

Production pattern I use most often:

Kafka → Connect (S3 Sink) → Bronze layer
Bronze → Spark Structured Streaming → Silver layer

Kafka Connect handles the reliable landing. Spark handles the complex transformations. This avoids writing any custom consumer code.

Gotcha: Kafka Connect connectors vary wildly in quality. Always test the connector's failure recovery before production. The Confluent-maintained connectors are reliable; community connectors often aren't.

Q5: How would you handle Kafka message ordering across partitions?

Weak answer: "Kafka guarantees ordering within a partition. Use a single partition for global ordering."

Why it fails: A single partition destroys throughput and isn't practical.

Strong answer: Ordering requirements come in three levels:

Per-key ordering (most common): Produce with a key. All messages with the same key go to the same partition → ordering guaranteed within that key.

python

producer.send('events', key=user_id, value=event)

Global ordering: Rare requirement. Solutions:
Single partition (low throughput, not recommended)
Sequence numbers in messages + consumer-side reordering buffer
Use a different system (Redis Streams, Kinesis) that supports global ordering natively

Causal ordering: "Event A must be processed before Event B." Use the same key for causally related events, or embed a causal dependency chain in the message schema.

Production gotcha: Key-based partitioning breaks if you change partition count. The hash(key) → partition mapping changes, so the same user_id may land on a different partition. Solution: Use a custom partitioner with consistent hashing, or never change partition count for ordering-sensitive topics.

Advanced: Schema Registry, Compaction, and Backpressure

Q6: Why use a Schema Registry with Kafka?

Without a schema registry, any producer can send any format. One bad producer breaks all consumers. The Schema Registry enforces compatibility rules:

Backward compatible: New schema can read old data (safe for consumers to upgrade first)
Forward compatible: Old schema can read new data (safe for producers to upgrade first)
Full compatible: Both directions (safest, most restrictive)

Q7: What is log compaction and when do you use it?

Normal retention deletes old segments by time/size. Compaction keeps only the latest value per key, deleting older versions. Use cases: changelog topics (database CDC), configuration topics, KTable backing topics. Not suitable for: event streams where you need full history.

Q8: How do you handle backpressure in Kafka consumers?

Pause/resume: consumer.pause(partitions) stops fetching from specific partitions while processing catches up
Reduce `max.poll.records`: Process fewer messages per poll cycle
Scale horizontally: Add consumers (up to partition count)
Consumer lag alerting: Monitor kafka_consumer_group_lag — alert when lag exceeds SLA threshold

Test Your Kafka Knowledge with DataEngPrep

Reading about Kafka is easy. Articulating these concepts clearly under interview pressure is hard.

Use DataEngPrep's Answer Analyzer to practice:

Type your answer to any Kafka question
Get scored on completeness, accuracy, and communication
See the improved FAANG-level version

Over 1,800 real interview questions covering Kafka, Spark, SQL, Airflow, and more.

Start practicing free

Written by the DataEngPrep Team

Our editorial team consists of experienced data engineers who have worked at top tech companies and gone through hundreds of real interviews. Every article is reviewed for technical accuracy and practical relevance to help you prepare effectively.

Learn more about our team →

☁️

Capco Data Engineer Interview Questions & Answers (2026)

Cloud/Tools · 14 min read

→☁️

Virtusa Data Engineer Interview Questions & Answers (2026)

Cloud/Tools · 8 min read

→☁️

Cloud Data Engineering Interview Prep: AWS vs GCP vs Azure

Cloud/Tools · 22 min read

→

Practice These Questions

easyCDC During Migration - explain approaches for real-time Change Data Capture→hardWhat architecture are you following in your current project, and why?→easyWhat are Airflow Operators? Give examples.→

Think you can answer these questions? Find out in 30 seconds

Paste your answer and get instant AI feedback — see exactly where your answer is weak and how a FAANG-level candidate would respond.

Analyze My Answer — Free See Plans

← Back to all articles

Why Kafka Questions Trip Up Even Experienced Engineers

This guide covers 15 real questions from interviews at companies like Netflix, Uber, LinkedIn, and Walmart — with weak answers that get rejected and strong answers that get offers.

Q1: How do you decide the number of partitions for a Kafka topic?

Weak answer: "More partitions = more parallelism. I'd use a high number like 100."

Why it fails: Shows no understanding of the trade-offs. More partitions means more open file handles, longer leader election on broker failure, and higher end-to-end latency.

Strong answer: Partition count is a function of target throughput and consumer parallelism:

Start with throughput math: If each partition can handle 10MB/s and you need 100MB/s, you need at least 10 partitions.
Match consumer count: Maximum parallelism = number of partitions. If you have 20 consumers, you need at least 20 partitions.
Leave room to grow: Increasing partition count is easy (just add more), but decreasing is impossible without recreating the topic.

My production guidelines:

Start with 2x your expected consumer count
Cap at 50 partitions per topic unless throughput demands more
Monitor consumer lag — if it's consistently growing, add partitions + consumers
For key-based partitioning (e.g., by user_id), changing partition count redistributes keys and breaks ordering guarantees. Plan the initial count carefully.

Q2: Explain exactly-once semantics in Kafka

Weak answer: "Kafka supports exactly-once delivery using transactions."

Strong answer: Exactly-once in Kafka has three different scopes — most candidates only know one:

Idempotent producer (enable.idempotence=true): Prevents duplicates within a single producer session. The broker deduplicates using a sequence number per partition. Limitation: Only works within one producer instance. If the producer restarts, duplicates are possible unless you also use transactions.

Transactional producer: Wraps multiple writes (across topics/partitions) in an atomic transaction. Either ALL messages are committed or NONE. Used by Kafka Streams for exactly-once stream processing.

Consumer exactly-once: The hardest part. Kafka itself only guarantees at-least-once delivery to consumers. For exactly-once END-TO-END:
Option A: Transactional producer + read_committed consumers (Kafka Streams does this)
Option B: Idempotent writes on the consumer side (e.g., MERGE INTO with dedup key)

Q3: What happens during a consumer group rebalance?

Weak answer: "Partitions get redistributed among consumers in the group."

Strong answer: A rebalance is triggered when a consumer joins, leaves, or fails a heartbeat. During rebalance:

Stop-the-world (eager rebalance): ALL consumers in the group stop processing. The group coordinator revokes all partition assignments and reassigns them. This causes processing gaps — no messages are consumed during rebalance.

Cooperative rebalance (modern approach): Only the affected partitions are revoked and reassigned. Other consumers keep processing. Enabled with partition.assignment.strategy=cooperative-sticky.

Why this matters in production:

A deployment that rolls 10 consumers sequentially triggers 10 rebalances
Each eager rebalance pauses ALL consumers for seconds
Fix: Use cooperative-sticky assignment + session.timeout.ms=45000 + heartbeat.interval.ms=15000
Better fix: Use static group membership (group.instance.id) — consumers that restart within session.timeout.ms get their old partitions back without triggering a rebalance

Red flag answer: Saying "just increase the number of consumers" without mentioning rebalance overhead.

Q4: Kafka Connect vs custom consumers — when do you use each?

Weak answer: "Kafka Connect is easier, custom consumers give more control."

Strong answer: The decision depends on your sink system and transformation needs:

Use Kafka Connect when:

Sinking to a standard system (S3, HDFS, JDBC, Elasticsearch, BigQuery)
Transformations are simple (field renaming, type conversion, filtering)
You want exactly-once sink delivery (Connect handles offset management)
You don't want to maintain consumer code

Use custom consumers when:

Complex business logic per message (enrichment, API calls, conditional routing)
Custom error handling (dead-letter queues with specific retry policies)
Non-standard sink systems with no existing connector
You need fine-grained control over batching and parallelism

Production pattern I use most often:

Kafka → Connect (S3 Sink) → Bronze layer
Bronze → Spark Structured Streaming → Silver layer

Kafka Connect handles the reliable landing. Spark handles the complex transformations. This avoids writing any custom consumer code.

Q5: How would you handle Kafka message ordering across partitions?

Weak answer: "Kafka guarantees ordering within a partition. Use a single partition for global ordering."

Why it fails: A single partition destroys throughput and isn't practical.

Strong answer: Ordering requirements come in three levels:

Per-key ordering (most common): Produce with a key. All messages with the same key go to the same partition → ordering guaranteed within that key.

python

producer.send('events', key=user_id, value=event)

Global ordering: Rare requirement. Solutions:
Single partition (low throughput, not recommended)
Sequence numbers in messages + consumer-side reordering buffer
Use a different system (Redis Streams, Kinesis) that supports global ordering natively

Causal ordering: "Event A must be processed before Event B." Use the same key for causally related events, or embed a causal dependency chain in the message schema.

Advanced: Schema Registry, Compaction, and Backpressure

Q6: Why use a Schema Registry with Kafka?

Without a schema registry, any producer can send any format. One bad producer breaks all consumers. The Schema Registry enforces compatibility rules:

Backward compatible: New schema can read old data (safe for consumers to upgrade first)
Forward compatible: Old schema can read new data (safe for producers to upgrade first)
Full compatible: Both directions (safest, most restrictive)

Q7: What is log compaction and when do you use it?

Q8: How do you handle backpressure in Kafka consumers?

Pause/resume: consumer.pause(partitions) stops fetching from specific partitions while processing catches up
Reduce `max.poll.records`: Process fewer messages per poll cycle
Scale horizontally: Add consumers (up to partition count)
Consumer lag alerting: Monitor kafka_consumer_group_lag — alert when lag exceeds SLA threshold

Test Your Kafka Knowledge with DataEngPrep

Reading about Kafka is easy. Articulating these concepts clearly under interview pressure is hard.

Use DataEngPrep's Answer Analyzer to practice:

Type your answer to any Kafka question
Get scored on completeness, accuracy, and communication
See the improved FAANG-level version

Over 1,800 real interview questions covering Kafka, Spark, SQL, Airflow, and more.

Start practicing free

Apache Kafka Interview Questions for Data Engineers: 15 Questions That Actually Get Asked (2026)

Key Takeaways

Why Kafka Questions Trip Up Even Experienced Engineers

Q1: How do you decide the number of partitions for a Kafka topic?

Q2: Explain exactly-once semantics in Kafka

Q3: What happens during a consumer group rebalance?

Q4: Kafka Connect vs custom consumers — when do you use each?

Q5: How would you handle Kafka message ordering across partitions?

Advanced: Schema Registry, Compaction, and Backpressure

Test Your Kafka Knowledge with DataEngPrep

Written by the DataEngPrep Team

Related Articles

Practice These Questions

Think you can answer these questions? Find out in 30 seconds

Apache Kafka Interview Questions for Data Engineers: 15 Questions That Actually Get Asked (2026)

Key Takeaways

Why Kafka Questions Trip Up Even Experienced Engineers

Q1: How do you decide the number of partitions for a Kafka topic?

Q2: Explain exactly-once semantics in Kafka

Q3: What happens during a consumer group rebalance?

Q4: Kafka Connect vs custom consumers — when do you use each?

Q5: How would you handle Kafka message ordering across partitions?

Advanced: Schema Registry, Compaction, and Backpressure

Test Your Kafka Knowledge with DataEngPrep

Written by the DataEngPrep Team

Related Articles

Practice These Questions

Think you can answer these questions? Find out in 30 seconds