Q: User Shopping Sprees
"Given a purchase history table with timestamps, identify users who made multiple purchases within a short span (e.g., within 30 minutes). List these users and how many sprees they had."
Step By Step Answer:
You have a purchase_history
table with:
user_id
purchase_time
(timestamp)
You need to:
- Find users who made multiple purchases within 30 minutes.
- Count how many such "shopping sprees" each user had.
Sample Data (purchase_history
):
user_id | purchase_time |
---|---|
1 | 2024-01-01 10:00:00 |
1 | 2024-01-01 10:20:00 |
1 | 2024-01-01 12:00:00 |
2 | 2024-01-01 09:00:00 |
2 | 2024-01-01 09:40:00 |
3 | 2024-01-01 15:00:00 |
3 | 2024-01-01 15:25:00 |
3 | 2024-01-01 15:50:00 |
Step-by-Step Explanation:
Step 1: Compare Each Purchase to the Previous Purchase
We want to know:
- "How many minutes passed between this purchase and the last one?"
We can use the SQL LAG()
function to look at the previous purchase.
Example:
For User 1:
- 10:00 → no previous purchase
- 10:20 → 20 minutes after 10:00 → within 30 minutes → 1 spree
- 12:00 → 100 minutes after 10:20 → not within 30 minutes
For User 2:
- 9:00 → no previous purchase
- 9:40 → 40 minutes after 9:00 → not within 30 minutes
For User 3:
- 15:00 → no previous purchase
- 15:25 → 25 minutes after 15:00 → within 30 minutes → 1 spree
- 15:50 → 25 minutes after 15:25 → within 30 minutes → another spree
Step 2: Identify Purchases Within 30 Minutes
We will calculate the time difference between current and previous purchases, and keep only those where the gap is <= 30 minutes.
Step 3: Count How Many Sprees Each User Had
After identifying purchases made within 30 minutes, count how many times it happened for each user.
SQL Query:
WITH purchase_diffs AS (
SELECT
user_id,
purchase_time,
LAG(purchase_time) OVER (PARTITION BY user_id ORDER BY purchase_time) AS previous_purchase_time
FROM
purchase_history
)
SELECT
user_id,
COUNT(*) AS spree_count
FROM
purchase_diffs
WHERE
previous_purchase_time IS NOT NULL
AND TIMESTAMPDIFF(MINUTE, previous_purchase_time, purchase_time) <= 30
GROUP BY
user_id;
Explanation of SQL:
LAG(purchase_time) OVER (PARTITION BY user_id ORDER BY purchase_time)
:
For each user, find the previous purchase.- In the
WHERE
:- Make sure previous_purchase_time exists (not NULL for the very first purchase).
- Calculate the minutes difference. If it's <= 30 minutes, it’s a spree.
COUNT(*)
the number of sprees for each user.
Expected Final Output:
user_id | spree_count |
---|---|
1 | 1 |
3 | 2 |
User 2 is not shown because they didn't have any purchases within 30 minutes.
Important Points:
- We must partition by
user_id
because we want to compare each user's purchases separately. - Only purchases within 30 minutes from the previous purchase count as a spree.
- First purchase per user will have no previous purchase to compare to (NULL).