Q: User Shopping Sprees
"Given a purchase history table with timestamps, identify users who made multiple purchases within a short span (e.g., within 30 minutes). List these users and how many sprees they had."

Step By Step Answer:

You have a purchase_history table with:

  • user_id
  • purchase_time (timestamp)

You need to:

  1. Find users who made multiple purchases within 30 minutes.
  2. Count how many such "shopping sprees" each user had.

Sample Data (purchase_history):

user_idpurchase_time
12024-01-01 10:00:00
12024-01-01 10:20:00
12024-01-01 12:00:00
22024-01-01 09:00:00
22024-01-01 09:40:00
32024-01-01 15:00:00
32024-01-01 15:25:00
32024-01-01 15:50:00

Step-by-Step Explanation:

Step 1: Compare Each Purchase to the Previous Purchase

We want to know:

  • "How many minutes passed between this purchase and the last one?"

We can use the SQL LAG() function to look at the previous purchase.

Example:

For User 1:

  • 10:00 → no previous purchase
  • 10:20 → 20 minutes after 10:00 → within 30 minutes → 1 spree
  • 12:00 → 100 minutes after 10:20 → not within 30 minutes

For User 2:

  • 9:00 → no previous purchase
  • 9:40 → 40 minutes after 9:00 → not within 30 minutes

For User 3:

  • 15:00 → no previous purchase
  • 15:25 → 25 minutes after 15:00 → within 30 minutes → 1 spree
  • 15:50 → 25 minutes after 15:25 → within 30 minutes → another spree

Step 2: Identify Purchases Within 30 Minutes

We will calculate the time difference between current and previous purchases, and keep only those where the gap is <= 30 minutes.


Step 3: Count How Many Sprees Each User Had

After identifying purchases made within 30 minutes, count how many times it happened for each user.


SQL Query:

WITH purchase_diffs AS (
  SELECT
    user_id,
    purchase_time,
    LAG(purchase_time) OVER (PARTITION BY user_id ORDER BY purchase_time) AS previous_purchase_time
  FROM
    purchase_history
)

SELECT
  user_id,
  COUNT(*) AS spree_count
FROM
  purchase_diffs
WHERE
  previous_purchase_time IS NOT NULL
  AND TIMESTAMPDIFF(MINUTE, previous_purchase_time, purchase_time) <= 30
GROUP BY
  user_id;


Explanation of SQL:

  • LAG(purchase_time) OVER (PARTITION BY user_id ORDER BY purchase_time):
    For each user, find the previous purchase.
  • In the WHERE:
    • Make sure previous_purchase_time exists (not NULL for the very first purchase).
    • Calculate the minutes difference. If it's <= 30 minutes, it’s a spree.
  • COUNT(*) the number of sprees for each user.

Expected Final Output:

user_idspree_count
11
32

User 2 is not shown because they didn't have any purchases within 30 minutes.


Important Points:

  • We must partition by user_id because we want to compare each user's purchases separately.
  • Only purchases within 30 minutes from the previous purchase count as a spree.
  • First purchase per user will have no previous purchase to compare to (NULL).