DataEngPrep.tech
QuestionsPracticeAI CoachDashboardPacksBlog
ProLogin

Interview Questions

Real questions from top companies in SQL

700+ Easy450+ Medium650+ Hard
All CategoriesBehavioralSpark/Big DataSQLPython/CodingSystem Design/ArchitectureCloud/ToolsGeneral/Othereasymediumhard
141

Explain Time Travel in Snowflake.

SQLeasysnowflake0.4 min read
Cognizant
β†’
142

Explain Triggers in SQL with examples and scenarios for use.

SQLeasyetlsql0.4 min read
Fractal
β†’
143

Explain Union vs Union All in SQL.

SQLmediumjoinsql0.4 min read
Gartner
β†’
144

Explain a project where you had to influence stakeholders without having authority.

SQLeasy0.4 min read
Amazon
β†’
145

Articulate the architectural decisions, scalability trade-offs, and cost implications of designing an AWS data platform. How would you justify glue vs. EMR, Redshift vs. Athena, and when would each choice become cost-prohibitive at scale?

SQLhardjoinoptimizationpartition3.6 min read
Freecharge
β†’
146

Explain the architectural rationale for using LeftAntiJoin vs. NOT IN vs. NOT EXISTS in a distributed context. When does LeftAntiJoin become a performance or scalability bottleneck, and how do broadcast vs. shuffle joins affect cost?

SQLhardjoinpartition0.6 min read
Infosys
β†’
147

Describe a cross-team data project where you had to align architectural boundaries, ownership, and SLAs. How did you handle conflicting priorities, technical debt, and the scalability of communication as the number of stakeholders grew?

SQLeasy0.5 min read
American Express
β†’
148

Walk through a production incident where data freshness or correctness was at risk. How did you balance immediate mitigation vs. root-cause remediation? What architectural changes would prevent recurrence, and what are the cost vs. reliability trade-offs?

SQLeasy0.5 min read
Adidas
β†’
149

Explain the architectural trade-offs when optimizing a query on 100M+ rows: indexing vs. partitioning vs. materialized views. When does each approach become cost-prohibitive or operationally burdensome, and how do you quantify impact?

SQLhardoptimizationpartitionwindow0.5 min read
Bristol Myers Squibb
β†’
150

Implement a recursive query for hierarchy (employee-manager). Explain the termination guarantees, depth limits, and when a recursive CTE becomes a scalability bottleneck. What alternatives exist for graph-scale hierarchies in Spark or a data lake?

SQLmediumjoinspark0.6 min read
American Express
β†’
151

Explain bloom filters in Spark: how they reduce I/O and when they introduce false positives that hurt performance. What are the scalability and cost implications of enabling dynamic partition pruning and bloom filter pushdown at petabyte scale?

SQLhardjoinoptimizationpartition0.5 min read
American Express
β†’
152

Design a star schema for retail analytics (e.g., Adidas). Explain the dimensional modeling choices, SCD strategy, and how you would scale this schema for global multi-currency, multi-region deployments. What are the refresh and storage cost implications?

SQLhardjoinoptimizationpartition3.6 min read
Adidas
β†’
153

Compare Glue partition discovery with Hive MSCK/ADD PARTITION. Explain the operational and cost implications of crawler-based vs. partition-projection approaches. When does partition projection become necessary, and what are its limitations?

SQLmediumpartition0.5 min read
Capco
β†’
154

Explain how partitioning and bucketing in Hive/Spark optimize queries. What are the trade-offs in bucket count, partition cardinality, and small-file problem? When does over-partitioning or over-bucketing become counterproductive?

SQLmediumjoinpartitionspark0.6 min read
Adidas
β†’
155

Explain how to flatten a multi-level nested JSON file while loading it into BigQuery.

SQLeasybigqueryetl0.4 min read
Aarete
β†’
156

Explain how to implement cumulative sum in SQL.

SQLmediumpartitionsparksql0.3 min read
Hexaware
β†’
157

Explain how you would implement partitioning and bucketing for data stored in S3 to improve query performance.

SQLmediumjoinpartitionspark0.3 min read
EPAM
β†’
158

Explain how you would optimize Redshift query performance for a reporting system with large fact tables.

SQLmediumjoin0.4 min read
Capco
β†’
159

Explain how you would use repartition or coalesce effectively to optimize processing when analyzing data only for a specific region.

SQLmediumpartition0.4 min read
Dunnhumby
β†’
160

Explain indexing and its impact on database performance.

SQLmediumbigqueryjoinpartition0.3 min read
Goldman Sachs
β†’

Reading isn't practice. Get AI feedback on your answers.

Type or paste your answer to any of these questions and our AI Coach scores it, highlights gaps, and rewrites it at FAANG quality. Free to try.

Try AI Answer Coach β€” FreeStart a Mock Interview
Previous1...678910...25Next
Categories
All QuestionsSQLSpark / Big DataPython / CodingSystem DesignCloud / ToolsBehavioral
By Company
AmazonGoogleDatabricksSnowflakeMicrosoftNetflixUberTCS
Interview Guides
All GuidesTop SQL QuestionsTop Spark QuestionsTop Python QuestionsTop System DesignSQL Window FunctionsETL QuestionsData Modeling
Products
AI Interview CoachAnswer AnalyzerSQL PlaygroundResume AnalyzerInterview PacksPricing
Company
About UsContact UsAI DisclosureDisclaimerTerms of ServicePrivacy Policy
Β© 2026 DataEngPrep.tech. All rights reserved.
AboutBlogContactDisclaimer