Real questions from top companies
How does Spark's Catalyst Optimizer work? Explain its stages.
How do you handle late-arriving data in Spark Structured Streaming?
What is the difference between Managed and External tables in Hive/Spark?
What is the small-file problem in Spark, and how do you solve it?
Explain the concept of Broadcast Join in Spark. When should it be used?
How do you optimize Spark jobs for better performance? Mention at least 5 techniques.
What is the difference between a list and a tuple in Python?
Explain the difference between shallow copy and deep copy in Python.
Write a Python function to find the first non-repeating character in a string.
What are decorators in Python, and how do they work?
Explain the difference between args and kwargs in Python.
How do you ensure smooth communication between data scientists, business teams, and developers?
How do you handle conflicts within a team? Provide an example.
How do you handle disagreements within a team?
Tell me about a time when you faced a challenging situation at work and how you handled it.
What challenges did you face, and how did you tackle them?
What is the most difficult task you've ever worked on?
What would you do if a pipeline failed and you couldn't find the reason?
Why are you leaving your current company?
Why do you want to join this company?