Question: Median Google Search Frequency
"Given a table summarizing how many searches each user made per day, write a query to find the median number of searches."
Step-by-Step Explanation:
You have a table, let's call it user_searches
, with:
user_id
search_count
(number of searches made by that user in one day)
You need to:
- Find the median of all search counts.
Sample Data (user_searches
):
user_id | search_count |
---|---|
1 | 5 |
2 | 10 |
3 | 8 |
4 | 7 |
5 | 12 |
Step 1: Understand What Median Means
- Median = the middle number when all numbers are sorted.
- If odd number of rows → median is the middle value.
- If even number of rows → median is the average of the two middle values.
Example (sorted search counts):
- [5, 7, 8, 10, 12]
- Middle value = 8 → so, median is 8.
Step 2: Sort All Search Counts
We need to sort the search_count
in ascending order.
Step 3: Rank All Search Counts
We can assign a row number to each record after sorting.
Use the SQL ROW_NUMBER()
or RANK()
function.
Example with row numbers:
Row Number | user_id | search_count |
---|---|---|
1 | 1 | 5 |
2 | 4 | 7 |
3 | 3 | 8 |
4 | 2 | 10 |
5 | 5 | 12 |
Now:
- Total rows = 5 (odd number)
- Middle row = row 3 → median = 8
Step 4: Pick the Middle Row (or Rows)
If:
- Odd rows → Pick the row at position
(total_rows + 1) ÷ 2
. - Even rows → Take average of two middle rows.
SQL Query:
The logic slightly depends on your SQL database system.
Here’s a general method using Common Table Expressions (CTEs):
WITH ordered_searches AS (
SELECT
search_count,
ROW_NUMBER() OVER (ORDER BY search_count) AS rn
FROM
user_searches
),
counts AS (
SELECT COUNT(*) AS total_rows FROM user_searches
)
SELECT
AVG(search_count) AS median_search_count
FROM
ordered_searches, counts
WHERE
rn IN (FLOOR((total_rows + 1)/2), CEIL((total_rows + 1)/2));
Explanation of SQL:
ROW_NUMBER() OVER (ORDER BY search_count)
:
Assigns a row number to each record when ordered by search_count.COUNT(*)
:
Counts total number of rows.FLOOR((total_rows + 1)/2)
andCEIL((total_rows + 1)/2)
:- These pick the middle rows.
- If odd rows, both floor and ceil give same number.
- If even rows, they pick the two middle rows to average.
AVG(search_count)
:- If one row selected → it just returns that number.
- If two rows selected → it calculates the average.
Expected Final Output:
median_search_count |
---|
8 |
Important Points:
- If the number of rows is odd, the median is one value.
- If the number of rows is even, you average the two middle values.
- Always order by search_count before assigning row numbers.
- If your SQL database has a direct
PERCENTILE_CONT()
function (PostgreSQL, Redshift), you can use that instead for easier calculation.