**Why Moving Averages in Engagement:** DAU/MAU curves, session duration trends, retention metrics—all use rolling windows. Naive O(n*window) per row is unacceptable at scale. **Optimization:** (1) Cumulative sum: precompute cumsum, then avg[i] = (cumsum[i] - cumsum[i-w]) / w....
This hard-level Python/Coding question appears frequently in data engineering interviews at companies like Disney+ Hotstar. While less common, it tests deeper understanding that distinguishes strong candidates. Mastering the underlying concepts (optimization, spark, window) will help you answer variations of this question confidently.
This is a senior-level question that tests architectural thinking. Lead with the high-level design, then drill into specifics. Discuss trade-offs explicitly - there is rarely one correct answer. Show awareness of scale, fault tolerance, and operational complexity.
Why Moving Averages in Engagement: DAU/MAU curves, session duration trends, retention metrics—all use rolling windows. Naive O(n*window) per row is unacceptable at scale.
Optimization: (1) Cumulative sum: precompute cumsum, then avg[i] = (cumsum[i] - cumsum[i-w]) / w. O(n) total. (2) Pandas: rolling(window=7).mean()—vectorized, uses underlying C. (3) Streaming: maintain deque of size w, running sum—O(1) per update. (4) Exponential: EMA = αnew + (1-α)prev—O(1), no window storage.
Cost: At Hotstar, 10M users * 30 days with window=7: naive = 2.1B ops; cumsum = 300M. Use Spark window functions for distributed: rowsBetween(-6, 0).
def opt_moving_avg(arr, w):
cumsum = np.cumsum(np.insert(arr, 0, 0))
return (cumsum[w:] - cumsum[:-w]) / w
Want feedback on your answer?
Paste your answer to this question and our AI Coach scores it, finds gaps, and shows you the FAANG-level version.
Get the most asked SQL questions with expert answers. Instant download.
No spam. Unsubscribe anytime.
Paste your answer and get instant AI feedback with a FAANG-level improved version.
Analyze My Answer — FreeAccording to DataEngPrep.tech, this is one of the most frequently asked Python/Coding interview questions, reported at 1 company. DataEngPrep.tech maintains a curated database of 1,863+ real data engineering interview questions across 7 categories, verified by industry professionals.