What are the different modes in which you can submit Spark jobs? Explain each.
Spark/Big Dataeasy
2
What is the difference between Pandas DataFrame and Spark DataFrame? When would you prefer using each?
Spark/Big Datahard
3
What is the difference between external and internal tables in Hive?
Spark/Big Dataeasy
4
When submitting Spark jobs, how does the process work in the backend? Explain.
Spark/Big Datahard
5
Write a PySpark job that calculates the number of unique users who logged in per day, but exclude any logins from inactive users listed in a separate file.
Spark/Big Datamedium
6
Write a PySpark script to check for missing values and duplicate rows in a DataFrame. How would you ensure data quality before saving it to a storage system?
Spark/Big Datahard
7
Write the Spark command to rename an existing column in a DataFrame.
Spark/Big Dataeasy
8
Your Kafka producer schema has changed, and the new data includes additional fields. How would you ensure backward compatibility using Schema Registry while consuming data from the same topic?
Spark/Big Datamedium
+8 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.