What is the difference between SparkSession and SparkContext in Spark?
Spark/Big Datahard
2
Discuss the data size challenges in your previous projects. How did you optimize storage and processing?
Behavioralhard
3
What are your strengths, and how do they align with the Data Engineer role?
General/Otherhard
4
Implement a Python function to count unique words from a file and write them to another file.
Python/Codinghard
5
Describe a scenario where you used Databricks for real-time data processing.
SQLhard
6
Explain bloom filters in Spark: how they reduce I/O and when they introduce false positives that hurt performance. What are the scalability and cost implications of enabling dynamic partition pruning and bloom filter pushdown at petabyte scale?
SQLhard
7
Explain a scenario-based question on Spark optimization and how you would troubleshoot performance issues.
Spark/Big Datahard
8
Explain repartition vs. coalesce. Which one would you use to reduce shuffle operations?
Spark/Big Datahard
+13 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.