Interview Questions

Real questions from top companies · easy

700+ Easy450+ Medium650+ Hard

All Categories Behavioral Spark/Big Data SQL Python/Coding System Design/Architecture Cloud/Tools General/Othereasy medium hard

641

Compare ORC and Parquet

Spark/Big Dataeasybigqueryspark0.3 min read

KPMG

→

642

Compare Spark SQL vs. Hive Performance.

Spark/Big Dataeasysparksql0.4 min read

HCL

→

643

Compare Spark and MapReduce for iterative workloads

Spark/Big Dataeasyspark0.4 min read

Microsoft

→

644

Concatenate Columns in PySpark

Spark/Big Dataeasyspark0.4 min read

Presidio

→

645

Controlling mappers in MapReduce

Spark/Big Dataeasy0.4 min read

JP Morgan

→

646

Create a DataFrame with default column types

Spark/Big Dataeasypythonsparksql0.4 min read

KPMG

→

647

Data locality in Hadoop - explain

Spark/Big Dataeasyspark0.4 min read

JP Morgan

→

648

Databricks Cluster Management - standalone vs YARN mode

Spark/Big Dataeasyspark0.3 min read

Meesho

→

649

Databricks Job Cluster and SQL Endpoint - discuss Photon

Spark/Big Dataeasyetlsparksql0.5 min read

JP Morgan

→

650

Databricks notebooks vs. Fabric notebooks - differences

Spark/Big Dataeasylakehousespark0.3 min read

Nihilent

→

651

Databricks vs. PySpark?

Spark/Big Dataeasypythonspark0.3 min read

Comcast

→

652

Define Airflow and explain it as a workflow orchestration tool.

Spark/Big Dataeasyairflow0.3 min read

Fossil Group

→

653

Defining Tasks in DAG

Spark/Big Dataeasyairflowpython0.3 min read

Verizon

→

654

Delta vs Parquet - explain

Spark/Big Dataeasylakehouse0.3 min read

Myntra

→

655

Deploying DAGs

Spark/Big Dataeasyairflowpython0.3 min read

Verizon

→

656

Describe a custom EMR cluster configuration for Spark-based ETL with minimal cost.

Spark/Big Dataeasyetlspark0.3 min read

Capco

→

657

Describe building custom JARs for Spark jobs

Spark/Big Dataeasyspark0.3 min read

LTIMindtree

→

658

Describe how to pass data between tasks in Airflow using XComs.

Spark/Big Dataeasyairflow0.4 min read

Citi

→

659

Describe the cluster configuration used in your project, including memory allocation, number of nodes, and executor/driver settings.

Spark/Big Dataeasyspark0.3 min read

Capgemini

→

660

Describe the role of a workflow orchestrator like Airflow in a data pipeline.

Spark/Big Dataeasyairflow0.3 min read

Swiggy

→

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach — Free Start a Mock Interview

Previous 1...31 32 33 34 35 36 Next