Architecturally, how do Job–Stage–Task boundaries in Spark's execution model impact cluster sizing, shuffle cost, and when would you deliberately collapse or split stages?
Spark/Big Datahard
2
What are the key components of the Spark execution model (Job, Stage, Task)?
Spark/Big Datahard
3
How many cities does each department operate in? List the top 3 departments in terms of the most number of cities. In case of a tie, order by dept_id.
Python/Codingmedium
4
List every combination of dept_name, employee_name, and city such that the employee belongs to the department and the same city in which the department is located.
Python/Codingmedium
5
Add a column to the Employees table that shows the name of the employee with the next higher employee_id.
SQLmedium
6
Find the third-highest salary for each department.
SQLmedium
7
Write a PySpark job to find the top 3 employees of each department, where Age < 30 and Salary > department average salary.
SQLmedium
8
Write a PySpark script to read a CSV file, filter rows where the age column is less than 18, and write the result to a new CSV file.
Spark/Big Datamedium
+8 More Questions with Expert Answers
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.