Real questions from top companies
Write a PySpark script to check for missing values and duplicate rows in a DataFrame. How would you ensure data quality before saving it to a storage system?
Write a PySpark script to filter out invalid records from a dataset and calculate the average for a specific column, ensuring the schema is strictly defined at runtime.
Write a PySpark script to process data stored in Delta format and transform it into Parquet.
Write a PySpark script to read a CSV file, filter rows where the age column is less than 18, and write the result to a new CSV file.
Write a Spark job to count word occurrences from an S3 dataset.
Write a complete PySpark program from import statements to the stop statement, covering transformations and actions.
Write a transformation in PySpark to join and clean multiple raw input sources
Write code to read data from Delta Lake in S3 and perform upsert based on primary key
Write maintainable, efficient Pandas or PySpark code.
Write the Spark command to rename an existing column in a DataFrame.
Writing Excel sheets to Delta tables in Databricks
You are given 10 worker machines with 100 GB RAM and 25 CPU cores. How would you determine the number of executors and the size of each executor?
Your Kafka producer schema has changed, and the new data includes additional fields. How would you ensure backward compatibility using Schema Registry while consuming data from the same topic?
Z-Ordering - use cases for partitioned Delta tables
Architect a solution to handle notifications for millions of users with varying preferences.
Build a banking system architecture from scratch, highlighting critical workflows, scalability, and data management strategies.
Business Role of Data Pipeline
CAP Theorem
CI/CD implementation across environments (DEV, QA, UAT, PreProd, PROD)
Can Schema Evolution lead to data inconsistencies? If so, how do you manage them?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.