Interview questions
What are the different modes in which you can submit Spark jobs? Explain each.
What is the difference between Pandas DataFrame and Spark DataFrame? When would you prefer using each?
What is the difference between external and internal tables in Hive?
When submitting Spark jobs, how does the process work in the backend? Explain.
Write a PySpark job that calculates the number of unique users who logged in per day, but exclude any logins from inactive users listed in a separate file.
Write a PySpark script to check for missing values and duplicate rows in a DataFrame. How would you ensure data quality before saving it to a storage system?
Write the Spark command to rename an existing column in a DataFrame.
Your Kafka producer schema has changed, and the new data includes additional fields. How would you ensure backward compatibility using Schema Registry while consuming data from the same topic?
Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.