Data engineering interview questions · easy
Types of Integration Runtimes (IR) - self-hosted, Azure, SSIS
What are Azure Blueprints, and how are they different from Azure Policies?
What are Managed Identities in Azure, and how are they used in securing resources?
What are common issues faced with REST APIs in ADF, and how do you resolve them?
What are provisioned throughput and auto-scaling in DynamoDB?
What are the differences between %run and dbutils.notebook.run?
What are the differences between Azure Key Vault-backed and Databricks-backed Secret Scopes?
What are the differences between SSE-S3, SSE-KMS, and SSE-C encryption?
What are the limitations of AWS Glue and Lambda?
What are the limitations of using Azure Hybrid Connections?
What are the methods to copy files to S3 without using the bucket upload feature?
What is Integration Runtime?
What is Secret Scope, and how is it used in Databricks?
What is Unity Catalog, and how is it implemented in your project?
What is XCom in Airflow?
What is the difference between S3 and EFS? When would you use each?
What is the role of AWS KMS in securing sensitive data?
What metrics would you track in CloudWatch for a Kinesis-based pipeline?
What role does Amazon Macie play in securing sensitive data in S3?
What steps would you take to secure data stored in S3?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
Learn the platform used by your target companies. AWS is most common overall (Glue, Redshift, S3, Kinesis). GCP is preferred by Google and startups (BigQuery, Dataflow, Pub/Sub). Azure is dominant in enterprise (Synapse, Data Factory). Learn one deeply and understand the equivalents on others.
Core tools: SQL, Python, Spark, Airflow (or equivalent orchestrator), one cloud platform. Increasingly important: dbt, Kafka, Terraform, Docker/Kubernetes, Delta Lake or Apache Iceberg, a data observability tool. The specific stack varies by company.
Yes. Apache Airflow is the most widely used orchestration tool and questions about DAG design, task dependencies, XComs, operators, and failure handling are common. If the company uses a different orchestrator, expect similar questions adapted to their tool.