Data engineering interview questions · hard
What are the key components of AWS Glue, and how do they work together?
What is Snowflake's architecture, and why is it unique?
What is the difference between S3 and HDFS?
ADF Optimization Techniques?
Azure Fabric in Cloud Architecture?
Business generates TBs of data daily. How would you design the data pipeline in Azure?
Core services of AWS used in data engineering?
Data Lakehouse architecture in Azure?
Describe AWS Glue components and their functions.
Could you describe a specific cost optimization strategy you implemented in the cloud and its results?
Describe how Adidas could use S3 and Athena to analyze clickstream data.
Design an end-to-end data pipeline using Glue, Lambda, EC2, S3, Redshift, and Athena.
Design: Migrate data from multiple sources (Hadoop, S3, Oracle DB) into a final S3 bucket
Explain Snowpipe as a continuous data ingestion service.
Explain steps to optimize data read performance from cloud storage (S3 or Azure Blob).
Explain the components of ADF: Pipelines, Activities, Linked Services, Datasets, Triggers, and Integration Runtimes
Explain the difference between Azure Event Hub and Azure Service Bus.
Explain the process of setting up an ETL pipeline using AWS services.
Explain the purpose and architecture of Azure Synapse Analytics.
Explain your cloud-based data pipeline on AWS
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
Learn the platform used by your target companies. AWS is most common overall (Glue, Redshift, S3, Kinesis). GCP is preferred by Google and startups (BigQuery, Dataflow, Pub/Sub). Azure is dominant in enterprise (Synapse, Data Factory). Learn one deeply and understand the equivalents on others.
Core tools: SQL, Python, Spark, Airflow (or equivalent orchestrator), one cloud platform. Increasingly important: dbt, Kafka, Terraform, Docker/Kubernetes, Delta Lake or Apache Iceberg, a data observability tool. The specific stack varies by company.
Yes. Apache Airflow is the most widely used orchestration tool and questions about DAG design, task dependencies, XComs, operators, and failure handling are common. If the company uses a different orchestrator, expect similar questions adapted to their tool.