Real questions from top companies in Spark/Big Data Β· medium
What is the role of Zookeeper in Kafka?
What is the usage of Optimize and REORG commands in Databricks?
What performance tuning techniques do you apply in both Sqoop and Spark to optimize their execution?
What role does executor memory and CPU configuration play in maximizing parallelism?
What strategies would you use to optimize Spark jobs for both performance and cost on AWS?
What techniques ensure deduplication in large datasets?
What's the difference between narrow and wide transformations?
When would you choose a broadcast join over a shuffle join? Any memory risks?
Which Spark property controls the number of shuffle partitions?
Write PySpark code to extract data from a CSV and create a table.
Write PySpark code to save a DataFrame in Parquet format to an S3 bucket.
Write a PySpark job that calculates the number of unique users who logged in per day, but exclude any logins from inactive users listed in a separate file.
Write a PySpark script to filter out invalid records from a dataset and calculate the average for a specific column, ensuring the schema is strictly defined at runtime.
Write a PySpark script to process data stored in Delta format and transform it into Parquet.
Write a PySpark script to read a CSV file, filter rows where the age column is less than 18, and write the result to a new CSV file.
Write a complete PySpark program from import statements to the stop statement, covering transformations and actions.
Write a transformation in PySpark to join and clean multiple raw input sources
Write code to read data from Delta Lake in S3 and perform upsert based on primary key
Write maintainable, efficient Pandas or PySpark code.
Your Kafka producer schema has changed, and the new data includes additional fields. How would you ensure backward compatibility using Schema Registry while consuming data from the same topic?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.