Data engineering interview questions · easy
How do you move ADF pipelines from development to production using ARM templates?
How do you secure data at rest and in transit for AWS RDS?
How do you use Azure Databricks notebooks within ADF pipelines?
How does ADF help streamline data movement in your project?
How does Azure Kubernetes Service (AKS) manage scaling and updates for containerized applications?
How does IAM role chaining work?
How does the trust relationship policy in IAM roles work?
How is Azure Key Vault used to manage encryption keys in Databricks?
How to copy all 1000 tables from source to target in ADF?
How to manage AWS IAM roles and policies for data security?
How would you configure Spot Instances for a resilient EMR cluster?
How would you copy files and folders present in Azure Data Lake Storage (ADLS)?
How would you handle a situation where an EMR cluster fails mid-job?
How would you implement a secure data lake on AWS?
In ADF, how do you handle a scenario where you need to process only new or changed files from a source?
How would you pass data between Lambda functions in Step Functions?
Running multiple notebooks - dbutils.notebook.run()
S3 Storage Options: Describe Standard, Intelligent-Tiering, and Glacier.
Secret Scope usage for managing credentials securely.
Securing AWS Lambda: IAM roles, VPC integration, and security measures?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.
Learn the platform used by your target companies. AWS is most common overall (Glue, Redshift, S3, Kinesis). GCP is preferred by Google and startups (BigQuery, Dataflow, Pub/Sub). Azure is dominant in enterprise (Synapse, Data Factory). Learn one deeply and understand the equivalents on others.
Core tools: SQL, Python, Spark, Airflow (or equivalent orchestrator), one cloud platform. Increasingly important: dbt, Kafka, Terraform, Docker/Kubernetes, Delta Lake or Apache Iceberg, a data observability tool. The specific stack varies by company.
Yes. Apache Airflow is the most widely used orchestration tool and questions about DAG design, task dependencies, XComs, operators, and failure handling are common. If the company uses a different orchestrator, expect similar questions adapted to their tool.