Still explaining SparkContext like it's 2018? See the modern FAANG-level answer that demonstrates real Spark expertise and gets offers.
What is the difference between SparkSession and SparkContext in Spark?
SparkContext is the entry point for Spark functionality in Spark 1.x. It connects to the cluster and manages resources. SparkSession was introduced in Spark 2.0 as a unified entry point that combines SparkContext, SQLContext, and HiveContext. You create it using SparkSession.builder(). SparkSession is the recommended way now.
SparkContext (Spark 1.x era): Low-level entry point for RDD operations. You had to manage separate contexts for SQL (SQLContext) and Hive (HiveContext). In production, this meant:
SparkSession (Spark 2.0+): Unified entry point that wraps SparkContext and adds:
pythonspark = SparkSession.builder \ .appName('pipeline') \ .config('spark.sql.adaptive.enabled', 'true') \ .enableHiveSupport() \ .getOrCreate() # Still access SparkContext when needed (e.g., broadcast variables) sc = spark.sparkContext broadcast_lookup = sc.broadcast(lookup_dict)
spark.catalog.listDatabases(), spark.catalog.tableExists() — no more raw HiveContext callsWhen you still touch SparkContext directly: Broadcast variables, accumulators, custom RDD operations, and setting job-level properties like sc.setLocalProperty('spark.scheduler.pool', 'priority').
A weak answer recites the documentation. A strong answer explains the production pain points that motivated the change and shows you know when you still need the lower-level API.
Paste your answer and get instant AI-powered feedback with a FAANG-level improved version.
Analyze My Answer — Free3 free analyses per day. No sign-up required.