Interview questions
Preparing for a data engineering interview at PWC? This page contains 41 real interview questions sourced from verified PWC interview experiences. Questions are sorted by frequency — the ones asked most often appear first.
PWC data engineering interviews typically focus on Spark/Big Data, System Design/Architecture, and SQL. The interview bar skews toward harder problems (18 hard vs. 12 easy), suggesting emphasis on depth and system-level thinking.
Use the difficulty filters above to focus your preparation. For each question, attempt your own answer first, then compare with our expert solution. You can also practice these questions in our AI Mock Interview Coach for real-time feedback.
Design a cost-aware resource strategy for a Databricks workload with spiky and batch jobs. Explain Dynamic Resource Allocation, when to disable it, and how min/max executors and spot instances affect cost and SLAs.
Explain how Adaptive Query Execution changes the economics of Spark tuning. What problems does it solve at runtime, and when might you still need manual intervention (e.g., salting, broadcast hints)?
What challenges do you face when managing multiple notebooks in Git?
What are the differences between Azure Key Vault-backed and Databricks-backed Secret Scopes?
What is Secret Scope, and how is it used in Databricks?
How do you handle expired secrets in a production environment?
How does resource allocation adjust when a job experiences a sudden load increase?
What are the best practices for logging and monitoring bad data?
What are the implications of enabling schema auto-detection?
What are the potential downsides of enabling dynamic resource allocation?
What role does the executor heap size play in preventing OOM errors?
How do quarantine tables ensure data quality in downstream pipelines?
How does AQE optimize join operations dynamically?
How does improper partitioning affect Spark job performance?
What metrics would you analyze to determine if your partitioning strategy is effective?
Explain Delta Time Travel and the purpose of the vacuum command.
Explain the architecture of Spark, including the roles of driver, executors, DAGs, and SparkContext.
How do Delta Tables handle large-scale data updates efficiently?
How do caching strategies impact memory management in Databricks?
How do you configure retention periods for Delta tables?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.