Mastering Row Number Handling in SQL Queries

What is ROW_NUMBER() in SQL Queries

The SQL ROW_NUMBER() function is a window function, one of several SQL rank functions.

It assigns row numbers sequentially – no rank skipping! – so even rows with the same values (ties) will get different ranks. (Ties are ranked according to their physical order in the table.)

Syntax

Here’s the ROW_NUMBER() syntax.

ROW_NUMBER() OVER([PARTITION BY column_name] ORDER BY column_name [ASC|DESC])

The explanation is as follows:

ROW_NUMBER(): The function name, and it doesn’t take any arguments in the parentheses.
OVER(): Specifies the window over which ROW_NUMBER() operates; mandatory to create a window function.
PARTITION BY: An optional clause that splits the dataset into partitions and ROW_NUMBER() restarts row counting with each partition. If omitted, there’s only one partition containing all the query rows.
ORDER BY: A clause that defines the column(s) on which the row numbering will be based and the order in which the rows will be numbered in each partition. Generally considered a mandatory clause, however, ROW_NUMBER() can work without it in some database engines, such as PostgreSQL or MySQL. When omitted, the order of row numbers is unpredictable, i.e., it can be different whenever you run the query.

Step-by-Step Guide to Using ROW_NUMBER() in SQL Queries

I’ll start easy. Since I’ll be using PostgreSQL, I can show you an example of using ROW_NUMBER() without ORDER BY. After that, we can add it to another example and then add SQL PARTITION BY.

SQL Query ROW_NUMBER() Without ORDER BY and PARTITION BY

Let’s solve the interview question by Amazon and Bosch.

First 50% of Records From Dataset

Bosch

Amazon

MediumID 9859

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the first 50% records of the dataset.

Table: worker

Link to the question: https://platform.stratascratch.com/coding/9859-find-the-first-50-records-of-the-dataset

We will query the table worker.

Table: worker

worker_id	first_name	last_name	salary	joining_date	department
1	Monika	Arora	100000	2014-02-20	HR
2	Niharika	Verma	80000	2014-06-11	Admin
3	Vishal	Singhal	300000	2014-02-20	HR
4	Amitah	Singh	500000	2014-02-20	Admin
5	Vivek	Bhati	500000	2014-06-11	Admin

The task is to find the first 50% of the dataset's records. This is ideal for using ROW_NUMBER() without ORDER BY (and PARTITION BY), as the rows in the table will be numbered according to their physical order in such cases.

First, write a SQL CTE that will number the table rows: select all the columns from the table, write ROW_NUMBER() and the mandatory clause OVER(), with the new column alias where the row numbers will be displayed.

WITH cte AS (
  SELECT *,
         ROW_NUMBER() OVER() AS row_num
  FROM worker
)

The CTE returns this output.

All required columns and the first 5 rows of the solution are shown

worker_id	first_name	last_name	salary	joining_date	department	row_num
1	Monika	Arora	100000	2014-02-20	HR	1
2	Niharika	Verma	80000	2014-06-11	Admin	2
3	Vishal	Singhal	300000	2014-02-20	HR	3
4	Amitah	Singh	500000	2014-02-20	Admin	4
5	Vivek	Bhati	500000	2014-06-11	Admin	5
6	Vipul	Diwan	200000	2014-06-11	Account	6
7	Satish	Kumar	75000	2014-01-20	Account	7
8	Geetika	Chauhan	90000	2014-04-11	Admin	8
9	Agepi	Argon	90000	2015-04-10	Admin	9
10	Moe	Acharya	65000	2015-04-11	HR	10
11	Nayah	Laghari	75000	2014-03-20	Account	11
12	Jai	Patel	85000	2014-03-21	HR	12

Next, write SELECT, which takes all the columns – except row_num – from the CTE. To get the first half of the dataset, filter data in WHERE: in a subquery, count the number of rows in the table worker, divide it by two, and return only data where row_num is less than or equal to the value you get from the subquery.

This is the complete solution.

WITH cte AS (
  SELECT *,
         ROW_NUMBER() OVER() AS row_num
  FROM worker
)

SELECT worker_id,
       first_name,
       last_name,
       salary,
       joining_date,
       department
FROM cte
WHERE row_num <= 
  (SELECT COUNT(*) / 2
   FROM worker);

The output shows the first six rows of the table, which is 50% of the total records, as the worker table has 12 rows.

All required columns and the first 5 rows of the solution are shown

worker_id	first_name	last_name	salary	joining_date	department
1	Monika	Arora	100000	2014-02-20	HR
2	Niharika	Verma	80000	2014-06-11	Admin
3	Vishal	Singhal	300000	2014-02-20	HR
4	Amitah	Singh	500000	2014-02-20	Admin
5	Vivek	Bhati	500000	2014-06-11	Admin

How SQL Query ROW_NUMBER() Interacts with ORDER BY

The ORDER BY clause sorts the data within the entire set of rows or a partition if PARTITION BY is used.

Used with ROW_NUMBER(), ORDER BY determines by which column the rows will be numbered and whether they will be numbered in ascending or descending order.

Here’s a visualization that will help you understand. The example table is named movies.

How Row Number Interacts with Order By in Sql Query

If you wanted to number the rows from the oldest to the newest movie, you would write this query.

SELECT id, 
       title,
       year,
       ROW_NUMBER() OVER(ORDER BY year ASC) AS numbering_asc
FROM movies;

This is the result you would get.

To get the row numbers the other way round (from the newest to the oldest movie), write this query.

SELECT id, 
       title,
       year,
       ROW_NUMBER() OVER(ORDER BY year DESC) AS numbering_asc
FROM movies;

Here’s the output.

Example

Let’s now take a look at the real example. Here’s an interesting task given by Meta.

Fans vs Opposition

Interview Question Date: March 2020

employee_id	age	gender	popularity
0	24	M	6
1	31	F	4
2	29	F	0
3	33	M	7
4	36	F	6

employee_fan_id	position
5	1
8	2
17	3
9	4
3	5

employee_opposition_id	position
2	1
13	2
10	3
12	4
1	5

employee_fan_id	employee_opposition_id
5	2
8	13
17	10
9	12
3	1

How ROW_NUMBER() Interacts with PARTITION BY

As already mentioned, PARTITION BY splits the dataset into partitions based on the specified column(s).

If the data is partitioned, ROW_NUMBER() will number the rows in the first partition, then restart when it reaches the second partition (start again from 1), and so on.

Let’s change a bit the movies table we showed earlier.

How Row Number Interacts with Partition By in Sql Query

By using PARTITION BY and ORDER BY with ROW_NUMBER(), you could rank the movies by year for each director separately.

To do that, partition data by the director and order by year.

SELECT id,
       title,
       director,
       year,
      ROW_NUMBER() OVER (PARTITION BY director ORDER BY year ASC)
FROM movies;

The query above will number the rows from the oldest to the newest movie for each director, as shown below.

To do the opposite – number the rows from the newest to the oldest movie for each director – write the following query.

SELECT id,
       title,
       director,
       year,
       ROW_NUMBER() OVER (PARTITION BY director ORDER BY year DESC)
FROM movies;

Here’s the output.

Let’s now show this in a real-case scenario.

Example

Here’s a question from the City of San Francisco interview. The task is to find the worst businesses in each year, i.e., those that have the most violations during that year. The output should contain the year, business name, and the number of violations.

Worst Businesses

Interview Question Date: May 2018

City of San Francisco

HardID 9739

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Identify the business with the most violations each year, based on records that include a violation ID. For each year, output the year, the name of the business with the most violations, and the corresponding number of violations.

Table: sf_restaurant_health_violations

Link to the question: https://platform.stratascratch.com/coding/9739-worst-businesses

The dataset is sf_restaurant_health_violations.

Table: sf_restaurant_health_violations

business_id	business_name	business_address	business_city	business_state	business_postal_code	business_latitude	business_longitude	business_location	inspection_id	inspection_date	inspection_score	inspection_type	violation_id	violation_description	risk_category
5800	John Chin Elementary School	350 Broadway St	San Francisco	CA	94133	37.8	-122.4	{'longitude': '-122.403154', 'needs_recoding': False, 'latitude': '37.798358', 'human_address': '{"address":"","city":"","state":"","zip":""}'}	5800_20171017	2017-10-17	98	Routine - Unscheduled	5800_20171017_103149	Wiping cloths not clean or properly stored or inadequate sanitizer	Low Risk
64236	Sutter Pub and Restaurant	700 Sutter St	San Francisco	CA	94102	37.79	-122.41	{'longitude': '-122.41188', 'needs_recoding': False, 'latitude': '37.78881', 'human_address': '{"address":"","city":"","state":"","zip":""}'}	64236_20170725	2017-07-25	88	Routine - Unscheduled	64236_20170725_103133	Foods not protected from contamination	Moderate Risk
1991	SRI THAI CUISINE	4621 LINCOLN Way	San Francisco	CA	94122	37.76	-122.51	{'longitude': '-122.507779', 'needs_recoding': False, 'latitude': '37.764073', 'human_address': '{"address":"","city":"","state":"","zip":""}'}	1991_20171129	2017-11-29	86	Routine - Unscheduled	1991_20171129_103139	Improper food storage	Low Risk
3816	Washington Bakery & Restaurant	733 Washington St	San Francisco	CA	94108	37.8	-122.41	{'longitude': '-122.405845', 'needs_recoding': False, 'latitude': '37.795174', 'human_address': '{"address":"","city":"","state":"","zip":""}'}	3816_20160728	2016-07-28	67	Routine - Unscheduled	3816_20160728_103108	Contaminated or adulterated food	High Risk
39119	Brothers Restaurant	4128 GEARY Blvd	San Francisco	CA	94118	37.78	-122.46	{'longitude': '-122.463762', 'needs_recoding': False, 'latitude': '37.781148', 'human_address': '{"address":"","city":"","state":"","zip":""}'}	39119_20160718	2016-07-18	79	Routine - Unscheduled	39119_20160718_103133	Foods not protected from contamination	Moderate Risk

The solution is based on a subquery and a subquery of a subquery. Let’s start building the code from the bottom.

Here’s one subquery. It extracts the year of the inspection and counts the number of violations. Of course, all the rows that don’t represent violations should be excluded from the violations number count, so we exclude the NULLs.

SELECT business_name,
       EXTRACT(year FROM inspection_date :: DATE) AS year,
       COUNT(*) AS n_violations
FROM sf_restaurant_health_violations
WHERE violation_id IS NOT NULL
GROUP BY business_name, year;

Here are the first five rows of the output.

All required columns and the first 5 rows of the solution are shown

business_name	year	number_of_violations
David's Deli & Bistro	2016	1
PEKING WOK RESTAURANT	2015	1
Clay Oven Indian Cuisine	2018	1
Akira Japanese Restaurant	2017	1
ABSINTHE PASTRY	2017	1

Next, we will write SELECT that references the subquery above. In that SELECT, we use ROW_NUMBER() to rank the businesses by the number of violations in each year. The ranking is from the highest to the lowest number of violations.

SELECT business_name,
       year,
       number_of_violations,
       ROW_NUMBER() OVER (PARTITION BY year ORDER BY number_of_violations DESC) AS yearly_position
FROM
  (SELECT business_name,
          EXTRACT(year FROM inspection_date :: DATE) AS year,
          COUNT(*) AS number_of_violations
    FROM sf_restaurant_health_violations
    WHERE violation_id IS NOT NULL
    GROUP BY business_name, year) AS sq;

Here’s the partial output. As you can see, the row numbering resets when it reaches a new partition, i.e., year.

All required columns and the first 5 rows of the solution are shown

business_name	year	number_of_violations	yearly_position
Roxanne Cafe	2015	5	1
LOS PANCHOS	2015	1	2
Maggie Cafe	2015	1	3
SH Dream Inc	2015	1	4
Cabin	2015	1	5
Mi Yucatan	2015	1	6
PEKING WOK RESTAURANT	2015	1	7
LA ALTENA	2015	1	8
Da Cafe	2016	4	1
Modern Thai Inc.	2016	2	2

We now need to turn the above SELECT into a subquery, too, and write a SELECT that references it. That main SELECT statement selects the inspection year, business name, and number of violations.

So, the final solution is this.

SELECT year,
       business_name,
       number_of_violations
FROM
  (SELECT business_name,
          year,
          number_of_violations,
          ROW_NUMBER() OVER (PARTITION BY year ORDER BY number_of_violations DESC) AS yearly_position
   FROM
     (SELECT business_name,
             EXTRACT(year FROM inspection_date :: DATE) AS year,
             COUNT(*) AS number_of_violations
      FROM sf_restaurant_health_violations
      WHERE violation_id IS NOT NULL
      GROUP BY business_name, year) AS sq
    ) AS sq2
ORDER BY year;

This is the query’s output. Just glancing at it is enough to realize this can’t be the solution. We wanted to output the business with the most violations during the year, but we see that, for 2015, Roxanne Cafe had five violations, but some other businesses had one.

All required columns and the first 5 rows of the solution are shown

year	business_name	number_of_violations
2015	SH Dream Inc	1
2015	Maggie Cafe	1
2015	PEKING WOK RESTAURANT	1
2015	LOS PANCHOS	1
2015	LA ALTENA	1

This is obviously not right. What we need to do is filter data in WHERE, so the output shows only the rows ranked as first, i.e., having the most violations in the year.

Now, this is the final solution.

SELECT year,
       business_name,
       number_of_violations
FROM
  (SELECT business_name,
          year,
          number_of_violations,
          ROW_NUMBER() OVER (PARTITION BY year ORDER BY number_of_violations DESC) AS yearly_position
   FROM
     (SELECT business_name,
             EXTRACT(year FROM inspection_date :: DATE) AS year,
             COUNT(*) AS number_of_violations
      FROM sf_restaurant_health_violations
      WHERE violation_id IS NOT NULL
      GROUP BY business_name, year) AS sq
    ) AS sq2
WHERE yearly_position = 1
ORDER BY year;

The output shows the years from 2015 to 2018, the businesses with the most violations in each year, and the number of their violations.

All required columns and the first 5 rows of the solution are shown

year	business_name	violation_count
2015	Roxanne Cafe	5
2016	Da Cafe	4
2017	Peet's Coffee & Tea	2
2018	Lollipot	6

Advanced Techniques and Tips

I’ll show you three advanced SQL query ROW_NUMBER() uses.

1. Data Deduplication

Deduplicating data basically boils down to writing a query with ROW_NUMBER() then filtering only the rows ranked first in WHERE.

Example

Here’s an example of a practical application.

The question by Amazon wants you to identify the product codes of items whose unit prices are greater than the average unit prices of the product codes sold. The calculation should be based on the product code’s initial price (the price when the product first arrives in the store), on the unique product code (as the product codes will be repeated), and the output should exclude products that have been returned, i.e., the quantity value is negative.

The output should contain the product codes and their unit prices.

Stock Codes with Prices Above Average

Interview Question Date: May 2023

Amazon

MediumID 2164

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Based on the provided online transactions, you have been asked to identify the product codes of items whose unit prices are greater than the average unit prices of the product codes sold.

Note: Calculations must be based on each product code's original price. The original price is the initial price set for a product when it first arrives in the store. Since product codes will be repeated in the orders dataset, calculations should only be based on unique product codes.

The manager has also requested that you exclude product codes that have been returned, i.e., have a negative value for quantity since they are only interested in sold products. Your output should contain the product codes and their unit prices.

Table: online_retails

Link to the question: https://platform.stratascratch.com/coding/2164-stock-codes-with-prices-above-average

You’re given the table online_retails.

Table:

user_id	song_id	listen_duration
101	5001	240
101	5002	0
102	5001	300
102	5003	0
101	5001	240

The problem solution is based on two CTEs.

The first CTE utilizes ROW_NUMBER() to number the rows. The idea is to number the rows for each product code separately. Then, in the following CTE, we will deduplicate data by selecting only the first rows for each product, as they represent the data when the product first arrived at the store, so it contains an initial price.

So, in ROW_NUMBER(), we partition data by productcode and number the rows from the oldest to the newest invoice date. In addition, we output only rows with a positive value in the quantity column; by doing so, we exclude returned products from the output.

SELECT productcode,
       unitprice,
       ROW_NUMBER() OVER(PARTITION BY productcode ORDER BY invoicedate ASC) AS row_numbering
FROM online_retails
WHERE quantity > 0;

This is this CTE’s partial output.

All required columns and the first 5 rows of the solution are shown

productcode	unitprice	row_numbering
20996	0.19	1
21106	5.79	1
21126	1.25	1
21155	4.96	1
21169	4.13	1

Now, the second CTE just takes data from the first CTE and outputs only the first row for each product, i.e., when the product came into the store.

SELECT productcode,
       unitprice
FROM products
WHERE row_numbering = 1

Here’s its partial output.

All required columns and the first 5 rows of the solution are shown

productcode	unitprice
20996	0.19
21106	5.79
21126	1.25
21155	4.96
21169	4.13

The final SELECT takes the output of the second CTE and compares the unit price of each product with the total average unit price of all products.

The final solution is this.

WITH products AS
  (SELECT productcode,
          unitprice,
          ROW_NUMBER() OVER(PARTITION BY productcode ORDER BY invoicedate ASC) AS row_numbering
   FROM online_retails
   WHERE quantity > 0),
     
initial_price AS
  (SELECT productcode,
          unitprice
   FROM products
   WHERE row_numbering = 1)
   
SELECT productcode, 
       unitprice
FROM initial_price
WHERE unitprice >
    (SELECT AVG(unitprice)
     FROM initial_price);

2. ROW_NUMBER() With Aggregate Functions

The ROW_NUMBER() window function can also be used with aggregate functions. I’ll show you how to do it in a practical example that uses an aggregate function in ORDER BY of ROW_NUMBER().

Example

This is a question from Google.

Activity Rank

Interview Question Date: July 2021

Google

MediumID 10351

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the email activity rank for each user. Email activity rank is defined by the total number of emails sent. The user with the highest number of emails sent will have a rank of 1, and so on. Output the user, total emails, and their activity rank.

• Order records first by the total emails in descending order. • Then, sort users with the same number of emails in alphabetical order by their username. • In your rankings, return a unique value (i.e., a unique rank) even if multiple users have the same number of emails.

Table: google_gmail_emails

Link to the question: https://platform.stratascratch.com/coding/10351-activity-rank

We need to find the email rank for each user, where the user with the most emails sent will have a rank of one, and so on. We should output the user, total number of emails, and users’ activity rank. The users with the same number of emails should be sorted alphabetically.

We will work with the table google_gmail_emails.

Table:

id	from_user	to_user	day
0	6edf0be4b2267df1fa	75d295377a46f83236	10
1	6edf0be4b2267df1fa	32ded68d89443e808	6
2	6edf0be4b2267df1fa	55e60cfcc9dc49c17e	10
3	6edf0be4b2267df1fa	e0e0defbb9ec47f6f7	6
4	6edf0be4b2267df1fa	47be2887786891367e	1

In the query, we select the email sender and use COUNT(*) to calculate the number of emails by the user by counting the number of rows.

We then need to rank the users by the number of emails descendingly. Luckily, we can use an aggregate function with ROW_NUMBER(), so we write COUNT(*) in ORDER BY, too. That way, we perform data aggregation and rank the data according to the data aggregation result.

An additional ranking criterion is the alphabetical order of usernames, which is the question requirement.

Finally, we group the output by the user and sort the output first from the highest to the lowest number of sent emails.

SELECT  from_user, 
        COUNT(*) AS total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC)
FROM google_gmail_emails 
GROUP BY from_user
ORDER BY total_emails DESC;

Here’s the output.

All required columns and the first 5 rows of the solution are shown

from_user	total_emails	row_number
32ded68d89443e808	19	1
ef5fe98c6b9f313075	19	2
5b8754928306a18b68	18	3
55e60cfcc9dc49c17e	16	4
91f59516cb9dee1e88	16	5

3. Result Pagination

SQL query ROW_NUMBER() is used to divide the result set for web, application, or actual book use. Again, the best way to demonstrate this is to show you a practical example.

Example

This is a hard question by Amazon and eBay, so we’ll take it slowly.

Cookbook Recipes

Interview Question Date: February 2022

Ebay

Amazon

HardID 2089

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

You are given the table with titles of recipes from a cookbook and their page numbers. You are asked to represent how the recipes will be distributed in the book. Produce a table consisting of three columns: left_page_number, left_title and right_title. The k-th row (counting from 0), should contain the number and the title of the page with the number $2 \times k$ in the first and second columns respectively, and the title of the page with the number $2 \times k + 1$ in the third column. Each page contains at most 1 recipe. If the page does not contain a recipe, the appropriate cell should remain empty (NULL value). Page 0 (the internal side of the front cover) is guaranteed to be empty.

Table: cookbook_titles

Link to the question: https://platform.stratascratch.com/coding/2089-cookbook-recipes

We need to show how the recipes will be distributed in the book. The output will contain these three columns: left_page_number, left_title and right_title.

The left_page_number column for the Kth row rank (counting from 0) in the output will be calculated like this.

left\_page\_number = 2*Kth\ rank

The left_title column contains the recipe title found on the page shown in the left_page_number column. Here’s the formula.

left\_title = 2*Kth\ rank\ title

The right_title column will show the recipe titles from the right side. Here’s the formula.

right\_title = 2*Kth\ rank +1\ title

Generally, we can consider the left_page_number column as the ‘page group’ column. What I mean by that is that, even though it really is a left page number, each row will contain not only the left page data but also the right page data, i.e., recipe title. Because we are not showing the right page number, the values in the left_page_number will be every other page starting from zero, i.e., 0, 2, 4, and so on.

Additional info is that each page contains only one recipe; if the page doesn’t contain any recipe, the cell should be empty. The page 0, the internal side of the front cover, is guaranteed to be empty.

The question provides you with the table cookbook_titles.

Table: cookbook_titles

page_number	title
1	Scrambled eggs
2	Fondue
3	Sandwich
4	Tomato soup
6	Liver

One concept important to understand before going into code-writing is the integer division remainder.

The remainder in integer division is defined as the leftover part of the dividend after subtracting the largest possible multiple of the divisor.

Or:

Remainder = Dividend - (Divisor*Integer\ Quotient)

For example, the remainder of the page 4 divided by 2 is this.

Remainder = 4 - (2*\frac{4}{2}) = 4-4 = 0

So, every even-numbered page (including 0) will have the remainder that is 0.

Every odd-numbered page divided by 2 gives the remainder of 1. For example:

Remainder = 5 - (2*\frac{5}{2}) = 5 - (2*2) = 1

The SQL operator for getting the division remainder is called modulo (%).

Now, we are ready to start writing the problem solution and construct it using CTEs.

The first CTE utilizes the generate_series function. This is the function that creates a virtual table containing a series of integers. The first argument in the function defines the start of the series, which is 0 in our case. The second argument specifies the end of the series. In our case, this is the highest page number, i.e., the last page from the table cookbook_titles.

We use this CTE to generate page numbers from 0 (internal side of the front cover) to the highest page number in the cookbook_titles table.

WITH series AS (
  SELECT generate_series AS page_number
  FROM generate_series(0,
                         (SELECT MAX(page_number)
                          FROM cookbook_titles))
),

Here’s a partial code output; the complete output goes to 15.

All required columns and the first 5 rows of the solution are shown

page_number
0
1
2
3
4

The second CTE LEFT JOINs the first CTE with the original dataset to get the list of all the values from the series and only the matching page numbers from cookbook_titles.

WITH series AS (
  SELECT generate_series AS page_number
  FROM generate_series(0,
                         (SELECT MAX(page_number)
                          FROM cookbook_titles))
),
     
cookbook_titles_v2 AS (
  SELECT s.page_number,
         c.title
  FROM series s
  LEFT JOIN cookbook_titles c 
  ON s.page_number = c.page_number
)

Here’s how the output looks so far. OK, we now have the pages and recipes one under another. This is all fine for the tabular representation. However, we need to simulate the book layout so the recipes from the same sheet (‘page group’) must be shown in the same row.

All required columns and the first 5 rows of the solution are shown

page_number	title
0
1	Scrambled eggs
2	Fondue
3	Sandwich
4	Tomato soup

The third SELECT statement ties all the CTEs together. It utilizes ROW_NUMBER() to calculate the values in the left_page_number column by referencing the previous CTE.

There are several additional calculations, so let’s explain them.

The row numbering is done by page_number/2. Why this, why not simply page_number? Dividing it by 2 allows us to create ‘page groups’.

First, we have this calculation.

ROW_NUMBER() OVER(ORDER BY page_number/2) AS left_page_number

Here’s an example table showing what this does. The page number is divided by two, which results in every two pages belonging to the same page group. This happens because it is an integer division in SQL, i.e., the division of integers results in an integer with any remainder discarded. For example: 1/2 = 0, not 0.5. The ROW_NUMBER() function then takes the result of those calculations (column page_number/2 below) and ranks the rows according to them.

We know that the left_page_number must start from zero (the internal side of the front cover) in the final output. How do we achieve this?

We simply subtract 1 from the current ROW_NUMBER() code part and get this:

ROW_NUMBER() OVER(ORDER BY page_number/2)-1 AS left_page_number

With that subtraction, the example output looks like this, as shown in the rightmost column.

OK, the left_page_number is now equal to the page_number column. However, it still shows every page number (both left and right page numbers), but we want it to show only left page numbers, i.e., every other page starting from 0.

We do that in the following calculation by multiplying the current left_page_number with 2.

(ROW_NUMBER() OVER(ORDER BY page_number/2)-1)*2 AS left_page_number

Now, the rightmost column finally shows only the left page numbers, i.e., the values shown in green.

So, the rightmost column in the above example output looks like what we want.

OK, we can now go back to writing our last SELECT. For now, we have this.

SELECT (ROW_NUMBER() OVER(ORDER BY page_number/2)-1)*2 AS left_page_number

Next, we use SQL CASE WHEN and the string_agg() function to show the left_title and right_title values in the final output.

CASE WHEN is where you need the knowledge of the modulo operator in SQL we discussed earlier.

In the first CASE WHEN, the left title is each title from the page that returns the remainder 0 when divided by 2, i.e., the page number is 0 or an even number.

The second CASE WHEN displays recipe titles from the odd-numbered pages, i.e., when divided by 2, the remainder is 1.

However, CASE WHEN is row-specific, so it can’t combine the values of multiple rows in a group. (Remember, we consider the left_page column a ‘page group’.) So, we need string_agg(). It will concatenate all the left-page recipes into one string (with values separated by a comma) and all the right-page recipes into a second string. For string_agg() to aggregate values across all rows in a ‘page group’, the output must be grouped by page_number/2.

The complete code is shown below.

WITH series AS (
  SELECT generate_series AS page_number
  FROM generate_series(0,
                         (SELECT MAX(page_number)
                          FROM cookbook_titles))
),
     
cookbook_titles_v2 AS (
  SELECT s.page_number,
         c.title
  FROM series s
  LEFT JOIN cookbook_titles c 
  ON s.page_number = c.page_number
)

SELECT (ROW_NUMBER() OVER(ORDER BY page_number/2)-1)*2 AS left_page_number,
       string_agg(CASE
                      WHEN page_number % 2 = 0 
                      THEN title
                   END, ',') AS left_title,
       string_agg(CASE
                      WHEN page_number % 2 = 1 
                      THEN title
                  END, ',') AS right_title
FROM cookbook_titles_v2
GROUP BY page_number / 2;

Here’s – finally! – the output.

All required columns and the first 5 rows of the solution are shown

left_page_number	left_title	right_title
0		Scrambled eggs
2	Fondue	Sandwich
4	Tomato soup
6	Liver
8

Comparing ROW_NUMBER() With Other Ranking Functions in SQL Queries

There are also other ranking window functions alongside ROW_NUMBER(). You can find their definitions in the table below.

Comparing ROW NUMBER With Other Ranking Functions in SQL

The best way to explain how these functions work is to show them in an example table.

The table is named students.

If we rank students from the highest to the lowest scores, this is how each function would do it.

I marked two sets of rows with the same scores in different colors.

As you have learned so far, ROW_NUMBER() allocates the sequential rank, irrespective of the tied values in some rows. You can see that in the example of the rows with scores of 87 and 76.

RANK(), on the other hand, assigns the same rank to the rows with the same values. However, it skips ranks when it reaches the next non-tie row. The rows where the score is 87 are all ranked 3. As there are 3 rows ranked 3, the next rank for the non-tie row is:

Next\ non-tie\ rank = Current\ tie\ rank + Number\ of\ tie\ ranks = 3 + 3 = 6

DENSE_RANK() works almost the same as RANK(), except it doesn’t skip ranks after tie ranks. The rows ranked as 3 by RANK() get the same rank with DENSE_RANK(). However, the next row is ranked as 4, not 6, so there’s no skipping, and the next rank is assigned.

Now, NTILE(n) works a little differently. It divides rows into the n buckets as evenly as possible and assigns a bucket number to each row. If the number of rows in the table is not even, then the rows with the lower bucket rows get extra row(s).

Our table has 10 rows. It’s divided into three buckets of three rows. However, one row is extra, so it’s added to the first bucket. In the end, the result shows that the first bucket has four rows, and the second and third buckets have three rows.

Conclusion

ROW_NUMBER() in SQL queries is a powerful tool for numbering rows with many practical applications. Mastering ROW_NUMBER() requires understanding what the ORDER BY and PARTITION BY clauses do.

The SQL query interview questions we solved are just a few of many where you can use SQL query ROW_NUMBER(). You should continue using our coding question section and experiment with other ROW_NUMBER() use scenarios. Only that way you can gain proficiency.

Mastering Row Number Handling in SQL Queries

Categories

What is ROW_NUMBER() in SQL Queries

Syntax

Step-by-Step Guide to Using ROW_NUMBER() in SQL Queries

SQL Query ROW_NUMBER() Without ORDER BY and PARTITION BY

First 50% of Records From Dataset

How SQL Query ROW_NUMBER() Interacts with ORDER BY

Example

Fans vs Opposition

How ROW_NUMBER() Interacts with PARTITION BY

Example

Worst Businesses

Advanced Techniques and Tips

1. Data Deduplication

Example

Stock Codes with Prices Above Average

2. ROW_NUMBER() With Aggregate Functions

Example

Activity Rank

3. Result Pagination

Example

Cookbook Recipes

Comparing ROW_NUMBER() With Other Ranking Functions in SQL Queries

Conclusion

Latest Posts:

5 Alternatives to Matplotlib That Make Data Visualization a Breeze

How to Find the Minimum Value in SQL (with Examples)

Biases When Hiring For Data Science Roles

Share

Follow

Categories