Real questions from top companies · medium
Write a PySpark script to process data stored in Delta format and transform it into Parquet.
Write a PySpark script to read a CSV file, filter rows where the age column is less than 18, and write the result to a new CSV file.
Write a complete PySpark program from import statements to the stop statement, covering transformations and actions.
Write a transformation in PySpark to join and clean multiple raw input sources
Write code to read data from Delta Lake in S3 and perform upsert based on primary key
Write maintainable, efficient Pandas or PySpark code.
Your Kafka producer schema has changed, and the new data includes additional fields. How would you ensure backward compatibility using Schema Registry while consuming data from the same topic?
Z-Ordering - use cases for partitioned Delta tables
How do you ensure the scalability of a data pipeline handling rapidly growing data volumes?
How do you handle pipeline failures or delays?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.