Question 1

Describe a time when you had to deal with a difficult coworker.

Accepted Answer

Situation: A senior engineer was dismissive of my code reviews and suggestions for pipeline optimization, creating friction in the team. Task: Earn respect and align on technical direction without escalating conflict. Action: I focused on technical evidence rather than opinion—ran benchmarks (e.g., 30% faster shuffle with partition pruning), shared results in design docs, and asked for their feedback on the methodology....

Question 2

Data Lakehouse architecture in Azure?

Accepted Answer

**Section 1 — The Context (The 'Why')**
The primary challenge for this design in Cloud/Tools is balancing scale, cost, and reliability. At scale, naive approaches fail: single points of failure cause cascades, schema evolution breaks consumers, and over-provisioning explodes cost. Failure modes include silent data loss from non-idempotent writes, cascading job failures from tight coupling, and operational burden from manual intervention....

Question 3

Describe Amazon Athena and how it interacts with S3.

Accepted Answer

Architectural logic: Athena is serverless Presto/Trino—no cluster to manage; you pay per TB scanned. Why this model: Perfect for ad-hoc analytics on data lakes; decouples storage (S3) from compute. S3 interaction: Athena reads directly from S3; no ETL load step. Glue Catalog provides schema; queries scan only relevant partitions and columnar bytes. Scalability: Unlimited concurrency; each query spins up its own cluster....

Question 4

Describe step scaling policies vs. target tracking policies in AWS Auto Scaling.

Accepted Answer

Architectural logic: Step scaling = magnitude-based (add 2 if CPU>70%, add 4 if >85%); Target tracking = maintain metric at target (e.g., 70% CPU). Why choose: Target = simpler, steady-state; Step = tiered response to severity. Scalability: Both scale out; Step can over-provision during spikes. Cost: Target tracking tends to right-size; Step can over-scale....

Question 5

Explain AWS Lake Formation and its benefits.

Accepted Answer

Architectural logic: Lake Formation = governance layer over data lake. Benefits: Centralized table/column permissions; Glue Catalog integration; row/column-level security; audit. Why: IAM alone is coarse; Lake Formation enables fine-grained access (e.g., analyst sees only non-PII columns). Cost: No extra for Lake Formation; Glue/catalog costs apply....

Question 6

Explain the difference between S3 One Zone-IA and S3 Standard-IA.

Accepted Answer

Architectural difference: Standard-IA = 3+ AZs, 99.9% availability; One Zone-IA = 1 AZ, 99.5% availability, lower cost. Use Standard-IA when: high durability required. Use One Zone-IA when: data reproducible, cost critical. Trade-off: Durability vs cost....

Question 7

How do you handle cost optimization in AWS EMR clusters?

Accepted Answer

Strategies: (1) Spot for task nodes; On-Demand for master/core if needed. (2) Right-size instance types. (3) Auto-scaling. (4) Transient clusters—run job, terminate. (5) Spot blocks for 1–6h jobs. (6) Savings Plans. Trade-off: Spot interruptions; design for fault tolerance....

Question 8

How do you secure data at rest and in transit for AWS RDS?

Accepted Answer

At rest: Encryption (AWS-managed or KMS). In transit: SSL/TLS; rds.force_ssl=1; ACM. Best practice: Encrypt new instances; enforce SSL in parameter group; VPC + security groups; IAM auth optional; rotate credentials; CloudTrail and Enhanced Monitoring.

Question 9

How does IAM role chaining work?

Accepted Answer

Architectural flow: Role A assumes Role B assumes Role C. Each assume returns temp credentials. Role C trust allows Role B. Limit: Session duration max 1h (or 12h with increase). Use: Cross-account, delegated access....

Question 10

How would you optimize an ADF pipeline for high performance?

Accepted Answer

Strategies: (1) Increase parallelCopies, ForEach batchCount. (2) Right-size DIUs. (3) Staging for bulk loads. (4) Partition sources and sinks. (5) Azure IR vs SHIR; scale SHIR. (6) Data Flow partition tuning....

Persistent Systems Data Engineer Interview Questions

Difficulty Breakdown

Key Topics Covered

How to Use This Guide

Companies asking these questions

All 16 Questions

More Interview Prep Guides

Practice with AI — Not Just Reading