Question: Median Google Search Frequency
"Given a table summarizing how many searches each user made per day, write a query to find the median number of searches."

Step-by-Step Explanation:

You have a table, let's call it user_searches, with:

  • user_id
  • search_count (number of searches made by that user in one day)

You need to:

  1. Find the median of all search counts.

Sample Data (user_searches):

user_idsearch_count
15
210
38
47
512

Step 1: Understand What Median Means

  • Median = the middle number when all numbers are sorted.
  • If odd number of rows → median is the middle value.
  • If even number of rows → median is the average of the two middle values.

Example (sorted search counts):

  • [5, 7, 8, 10, 12]
  • Middle value = 8 → so, median is 8.

Step 2: Sort All Search Counts

We need to sort the search_count in ascending order.


Step 3: Rank All Search Counts

We can assign a row number to each record after sorting.
Use the SQL ROW_NUMBER() or RANK() function.

Example with row numbers:

Row Numberuser_idsearch_count
115
247
338
4210
5512

Now:

  • Total rows = 5 (odd number)
  • Middle row = row 3 → median = 8

Step 4: Pick the Middle Row (or Rows)

If:

  • Odd rows → Pick the row at position (total_rows + 1) ÷ 2.
  • Even rows → Take average of two middle rows.

SQL Query:

The logic slightly depends on your SQL database system.
Here’s a general method using Common Table Expressions (CTEs):

WITH ordered_searches AS (
  SELECT 
    search_count,
    ROW_NUMBER() OVER (ORDER BY search_count) AS rn
  FROM 
    user_searches
),
counts AS (
  SELECT COUNT(*) AS total_rows FROM user_searches
)

SELECT 
  AVG(search_count) AS median_search_count
FROM 
  ordered_searches, counts
WHERE 
  rn IN (FLOOR((total_rows + 1)/2), CEIL((total_rows + 1)/2));


Explanation of SQL:

  • ROW_NUMBER() OVER (ORDER BY search_count):
    Assigns a row number to each record when ordered by search_count.
  • COUNT(*):
    Counts total number of rows.
  • FLOOR((total_rows + 1)/2) and CEIL((total_rows + 1)/2):
    • These pick the middle rows.
    • If odd rows, both floor and ceil give same number.
    • If even rows, they pick the two middle rows to average.
  • AVG(search_count):
    • If one row selected → it just returns that number.
    • If two rows selected → it calculates the average.

Expected Final Output:

median_search_count
8

Important Points:

  • If the number of rows is odd, the median is one value.
  • If the number of rows is even, you average the two middle values.
  • Always order by search_count before assigning row numbers.
  • If your SQL database has a direct PERCENTILE_CONT() function (PostgreSQL, Redshift), you can use that instead for easier calculation.