JavaScript is required to use this application. Please enable JavaScript in your browser settings or disable any extensions that may be blocking scripts.
Questions tagged sql
Explain the configuration of a Spark cluster for optimal performance
Explain the differences between Spark's shuffle and broadcast join. When would you use each?
Given a DataFrame with columns id and name, add a new column department: If id < 100 assign HR, if id >= 100 and id < 200 assign admin.
Have you worked with UDFs in Spark? When do you use them, and how do they differ from built-in functions?
How do you convert an array column to multiple columns in PySpark?
How do you decide the number of partitions for repartitioning data in Spark?
How do you help stakeholders query Delta Lake tables? What tools and approaches?
How do you optimize long-running PySpark scripts on EMR?
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.