Interview Questions

Real questions from top companies in Spark/Big Data · easy

700+ Easy450+ Medium650+ Hard

All Categories Behavioral Spark/Big Data SQL Python/Coding System Design/Architecture Cloud/Tools General/Othereasy medium hard

Data locality in Hadoop - explain

Spark/Big Dataeasyspark0.4 min read

JP Morgan

→

Databricks Cluster Management - standalone vs YARN mode

Spark/Big Dataeasyspark0.3 min read

Meesho

→

Databricks Job Cluster and SQL Endpoint - discuss Photon

Spark/Big Dataeasyetlsparksql0.5 min read

JP Morgan

→

Databricks notebooks vs. Fabric notebooks - differences

Spark/Big Dataeasylakehousespark0.3 min read

Nihilent

→

Databricks vs. PySpark?

Spark/Big Dataeasypythonspark0.3 min read

Comcast

→

Define Airflow and explain it as a workflow orchestration tool.

Spark/Big Dataeasyairflow0.3 min read

Fossil Group

→

Defining Tasks in DAG

Spark/Big Dataeasyairflowpython0.3 min read

Verizon

→

Delta vs Parquet - explain

Spark/Big Dataeasylakehouse0.3 min read

Myntra

→

Deploying DAGs

Spark/Big Dataeasyairflowpython0.3 min read

Verizon

→

Describe a custom EMR cluster configuration for Spark-based ETL with minimal cost.

Spark/Big Dataeasyetlspark0.3 min read

Capco

→

Describe building custom JARs for Spark jobs

Spark/Big Dataeasyspark0.3 min read

LTIMindtree

→

Describe how to pass data between tasks in Airflow using XComs.

Spark/Big Dataeasyairflow0.4 min read

Citi

→

Describe the cluster configuration used in your project, including memory allocation, number of nodes, and executor/driver settings.

Spark/Big Dataeasyspark0.3 min read

Capgemini

→

Describe the role of a workflow orchestrator like Airflow in a data pipeline.

Spark/Big Dataeasyairflow0.3 min read

Swiggy

→

Describe your approach to managing offsets in Kafka.

Spark/Big Dataeasyspark0.3 min read

Fragma Data Systems

→

Discuss Delta Logs file format and its significance.

Spark/Big Dataeasy0.4 min read

Hexaware

→

Discuss the process of moving files in Databricks File System (DBFS).

Spark/Big Dataeasyspark0.3 min read

Capgemini

→

Executor vs Driver in Spark

Spark/Big Dataeasyspark0.4 min read

Presidio

→

Explain Bronze/Silver/Gold Layers.

Spark/Big Dataeasy0.4 min read

Altimetrik

→

Explain your approach to monitoring and logging Spark jobs in AWS. What tools would you use to identify performance bottlenecks?

Spark/Big Dataeasyspark0.6 min read

EPAM

→

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

Previous 1 2 3 4 5 Next