Question: User Watchlist Behavior
"Analyze the behavior of users on the Watchlist feature: how many items they add, how many they actually watch, and how long they keep them unwatched."

Step-by-Step Explanation:

You have two tables:

  1. watchlist_activity
    • user_id
    • content_id
    • added_time (timestamp when item was added to the watchlist)
  2. watch_history
    • user_id
    • content_id
    • watch_time (timestamp when the content was actually watched)

You need to:

  1. Find out for each user:
    • How many items they added to their watchlist.
    • How many items they actually watched.
    • Average time difference between when they added and watched (for items that were watched).

Sample Data:

watchlist_activity

user_idcontent_idadded_time
11012024-02-01 08:00:00
11022024-02-02 09:00:00
21032024-02-01 10:00:00
21042024-02-03 11:00:00

watch_history

user_idcontent_idwatch_time
11012024-02-05 08:00:00
21042024-02-10 11:00:00

Step 1: How Many Items Added

Simply count rows in watchlist_activity per user.


Step 2: How Many Items Watched

Join watchlist_activity and watch_history on user_id and content_id.

Count how many matches (watched after adding).


Step 3: How Long Items Stayed Unwatched (Average Delay)

For each matched item:

  • Calculate the difference between watch_time and added_time (in days).
  • Find the average delay per user.

SQL Query:

WITH watchlist_counts AS (
  SELECT
    user_id,
    COUNT(*) AS items_added
  FROM
    watchlist_activity
  GROUP BY
    user_id
),

watched_items AS (
  SELECT
    w.user_id,
    w.content_id,
    w.added_time,
    h.watch_time,
    DATEDIFF(h.watch_time, w.added_time) AS days_to_watch
  FROM
    watchlist_activity w
  JOIN
    watch_history h
  ON
    w.user_id = h.user_id
    AND w.content_id = h.content_id
)

, watched_counts AS (
  SELECT
    user_id,
    COUNT(*) AS items_watched,
    AVG(days_to_watch) AS average_days_to_watch
  FROM
    watched_items
  GROUP BY
    user_id
)

SELECT
  a.user_id,
  a.items_added,
  COALESCE(b.items_watched, 0) AS items_watched,
  COALESCE(b.average_days_to_watch, 0) AS average_days_to_watch
FROM
  watchlist_counts a
LEFT JOIN
  watched_counts b
ON
  a.user_id = b.user_id;


Explanation of SQL:

  • watchlist_counts:
    • Counts how many items each user added.
  • watched_items:
    • Joins added and watched data.
    • Calculates how many days it took to watch.
  • watched_counts:
    • Counts how many items each user actually watched.
    • Averages the days taken to watch.
  • Final SELECT:
    • Combines added and watched data.
    • COALESCE() is used to show 0 when a user has no watched items.

Expected Output:

user_iditems_addeditems_watchedaverage_days_to_watch
1214.0
2217.0

Explanation:

  • User 1 added 2 items, watched 1 item in 4 days.
  • User 2 added 2 items, watched 1 item in 7 days.

Important Points:

  • DATEDIFF(watch_time, added_time) measures delay in days.
  • Users who haven't watched any items will show items_watched = 0.
  • Always use a LEFT JOIN when combining counts to not lose users who watched 0 items.