#spark

Questions tagged spark · hard

All easy (120+)medium (130+)hard (410+)

101

Define what a User-Defined Function (UDF) is and how to register it in PySpark.

Spark/Big Datahard

102

Describe how you would monitor ETL job performance and handle long-running tasks.

Spark/Big Datahard

103

Describe how you would optimize a join between two large tables where one is significantly smaller, using broadcast joins in PySpark.

Spark/Big Datahard

104

Describe how you would optimize slow-running Spark jobs in a distributed environment.

Spark/Big Datahard

105

Describe the projects emphasizing Spark, Hadoop, or Azure for large-scale data processing

Spark/Big Datahard

106

Describe the role of a DAG Scheduler in PySpark

Spark/Big Datahard

107

Describe the stages of a Spark job and strategies to optimize Spark performance for large datasets.

Spark/Big Datahard

108

Design an ETL pipeline using Kafka and Spark Streaming

Spark/Big Datahard

+20 More Questions with Expert Answers

Unlock all 1,800+ expert answers, AI mock interviews, resume analyzer, SQL playground, and personalized progress tracking.

Unlock Full Access Try AI Coach Free

Previous 1...4 5 6 7 8...21 Next

Other Tags

#join #partition #python #optimization #sql #window #airflow #etl #bigquery #snowflake #lakehouse

#spark

Questions tagged spark · hard

All easy (120+)medium (130+)hard (410+)

101

Define what a User-Defined Function (UDF) is and how to register it in PySpark.

Spark/Big Datahard

102

Describe how you would monitor ETL job performance and handle long-running tasks.

Spark/Big Datahard

103

Describe how you would optimize a join between two large tables where one is significantly smaller, using broadcast joins in PySpark.

Spark/Big Datahard

104

Describe how you would optimize slow-running Spark jobs in a distributed environment.

Spark/Big Datahard

105

Describe the projects emphasizing Spark, Hadoop, or Azure for large-scale data processing

Spark/Big Datahard

106

Describe the role of a DAG Scheduler in PySpark

Spark/Big Datahard

107

Describe the stages of a Spark job and strategies to optimize Spark performance for large datasets.

Spark/Big Datahard

108

Design an ETL pipeline using Kafka and Spark Streaming

Spark/Big Datahard

+20 More Questions with Expert Answers

Unlock all 1,800+ expert answers, AI mock interviews, resume analyzer, SQL playground, and personalized progress tracking.

Unlock Full Access Try AI Coach Free

Previous 1...4 5 6 7 8...21 Next

Other Tags

#join #partition #python #optimization #sql #window #airflow #etl #bigquery #snowflake #lakehouse