Data engineering interview questions · easy
Explain the Terraform lifecycle for deploying a new cluster on AWS
Explain the difference between S3 One Zone-IA and S3 Standard-IA.
Explain the difference between Service Principal and Managed Identity in Azure.
Explain the differences between Azure IR, Self-hosted IR, and Azure-SSIS IR
Explain the differences between Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse.
Explain the role of Airflow DAGs in Cloud Composer.
Explain the use of Web Activity in ADF.
Explain using IAM roles for secure cross-account access to an S3 bucket.
Explain when you would use Glue instead of Lambda for a data ingestion use case.
Fabric dataflows vs. ADF dataflows
Fabric pipelines vs. ADF pipelines
GCP Authentication with Jenkins
How Airflow operates in a Kubernetes environment
How Airflow stores logs and the role of its backend database
How are Logic Apps used in ADF projects?
How do Logic Apps enhance notification workflows for monitoring pipelines?
How do you copy all files from one source path to target in ADF?
How do you delete files older than 30 days using ADF?
How do you handle API rate limits in ADF?
How do you monitor and log data pipelines in AWS?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
Learn the platform used by your target companies. AWS is most common overall (Glue, Redshift, S3, Kinesis). GCP is preferred by Google and startups (BigQuery, Dataflow, Pub/Sub). Azure is dominant in enterprise (Synapse, Data Factory). Learn one deeply and understand the equivalents on others.
Core tools: SQL, Python, Spark, Airflow (or equivalent orchestrator), one cloud platform. Increasingly important: dbt, Kafka, Terraform, Docker/Kubernetes, Delta Lake or Apache Iceberg, a data observability tool. The specific stack varies by company.
Yes. Apache Airflow is the most widely used orchestration tool and questions about DAG design, task dependencies, XComs, operators, and failure handling are common. If the company uses a different orchestrator, expect similar questions adapted to their tool.