Summary: in this tutorial, you will learn how to use the PostgreSQL window function to perform calculation across a set of rows related to the current row.
Introduction to PostgreSQL window function
We will create two tables named products
and product_groups
for the demonstration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | CREATE TABLE product_groups ( group_id serial PRIMARY KEY, group_name VARCHAR (255) NOT NULL ); CREATE TABLE products ( product_id serial PRIMARY KEY, product_name VARCHAR (255) NOT NULL, price DECIMAL (11, 2), group_id INT NOT NULL, FOREIGN KEY (group_id) REFERENCES product_groups (group_id) ); INSERT INTO product_groups (group_name) VALUES ('Smartphone'), ('Laptop'), ('Tablet'); INSERT INTO products (product_name, group_id,price) VALUES ('Microsoft Lumia', 1, 200), ('HTC One', 1, 400), ('Nexus', 1, 500), ('iPhone', 1, 900), ('HP Elite', 2, 1200), ('Lenovo Thinkpad', 2, 700), ('Sony VAIO', 2, 700), ('Dell Vostro', 2, 800), ('iPad', 3, 700), ('Kindle Fire', 3, 150), ('Samsung Galaxy Tab', 3, 200); |
The easiest way to understand the window functions is to start by reviewing the aggregate functions. An aggregate function aggregates data from a set of rows into a single row.
For example, the following AVG function calculates the average of prices of products in the products
table.
1 2 3 4 | SELECT AVG (price) FROM products; |
1 2 3 4 | avg ---------------------- 636.3636363636363636 (1 row) |
To apply the aggregate function to subsets of rows, you use the GROUP BY clause. The following statement returns the average price for each product group.
1 2 3 4 5 6 7 8 | SELECT group_name, AVG (price) FROM products INNER JOIN product_groups USING (group_id) GROUP BY group_name; |
1 2 3 4 5 6 | group_name | avg ------------+---------------------- Tablet | 600 Smartphone | 500 Laptop | 800 (3 rows) |
As you see the AVG
aggregate function reduces the number of rows returned by the queries in both cases.
Like an aggregate function, a window function operates on a set of rows, but it does not reduce the number of rows returned by the query.
The term window describes the set of rows on which the window function operates. A window function returns a value from the rows in a window.
For example, the following query returns the product name, the price, product group name, along with the average prices of each product group.
1 2 3 4 5 6 7 8 | SELECT product_name, price, group_name, AVG (price) OVER (PARTITION BY group_name) FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | price | group_name | avg --------------------+---------+------------+---------------------- HP Elite | 900.00 | Laptop | 800 Lenovo Thinkpad | 1100.00 | Laptop | 800 Sony VAIO | 600.00 | Laptop | 800 Dell Vostro | 600.00 | Laptop | 800 Microsoft Lumia | 300.00 | Smartphone | 500 HTC One | 400.00 | Smartphone | 500 Nexus | 500.00 | Smartphone | 500 iPhone | 800.00 | Smartphone | 500 iPad | 700.00 | Tablet | 600 Kindle Fire | 300.00 | Tablet | 600 Samsung Galaxy Tab | 800.00 | Tablet | 600 (11 rows) |
In this query, the AVG
function works as a window function that operates on a set of rows specified by the OVER
clause. Each set of rows is called a window.
The new syntax for this query is the
clause.AVG(price) OVER (PARTITION BY group_name)
Behind the scenes, PostgreSQL sorts rows by values in the group_name
column, the PARTITION BY
groups the rows into groups, and the AVG
function calculates the average price for each product group.
A window function performs the calculation on the result set after the JOIN, WHERE, GROUP BY and HAVING clause complete, but before the final ORDER BY clause.
PostgreSQL window function syntax
PostgreSQL provides a sophisticated syntax for window function call. The following illustrates the syntax of the window function call.
1 | window_function(arg1, arg2,..) OVER (PARTITION BY expression ORDER BY expression) |
Let’s examine the syntax above in more detail:
- The
window_function(arg1,arg2,...)
is window function. You can use any built-in or user-defined aggregate function as a window function. In addition, you can use the PostgresQL built-in window functions such asrow_number()
,rank()
,dense_rank()
, etc. - To divide rows into groups or partitions, you use
PARTITION BY
clause. Like in the example above, we userating
to divide the rows in thefilm
table into partitions. If you omit thePARTITION BY
clause, the whole result set is a single partition. - To sort rows in within a partition, you use the ORDER BY clause. The window function will process rows in the order specified by the
ORDER BY
clause especially for the window functions that are sensitive to the order such asfirst_value()
,last_value()
,nth_value()
, etc. If you omit theORDER BY
clause, the window function processes rows in an unspecified order.
ROW_NUMBER, RANK, and DENSE_RANK functions
The ROW_NUMBER()
, RANK()
, and DENSE_RANK
()
functions assign integer values to the rows based on their order.
The ROW_NUMBER()
function assigns a running serial number to rows in each partition. See the following query:
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT product_name, group_name, price, ROW_NUMBER () OVER ( PARTITION BY group_name ORDER BY price ) FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | row_number --------------------+------------+---------+------------ Dell Vostro | Laptop | 600.00 | 1 Sony VAIO | Laptop | 600.00 | 2 HP Elite | Laptop | 900.00 | 3 Lenovo Thinkpad | Laptop | 1100.00 | 4 Microsoft Lumia | Smartphone | 300.00 | 1 HTC One | Smartphone | 400.00 | 2 Nexus | Smartphone | 500.00 | 3 iPhone | Smartphone | 800.00 | 4 Kindle Fire | Tablet | 300.00 | 1 iPad | Tablet | 700.00 | 2 Samsung Galaxy Tab | Tablet | 800.00 | 3 (11 rows) |
For more information of the ROW_NUMBER()
and its usages, check it out the PostgreSQL ROW_NUMBER tutorial.
The RANK()
function assigns ranking within an ordered partition. If the values of the two rows are the same, the RANK()
function assigns the same rank, with the next ranking(s) skipped.
See the following query:
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT product_name, group_name, price, RANK () OVER ( PARTITION BY group_name ORDER BY price ) FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | rank --------------------+------------+---------+------ Dell Vostro | Laptop | 600.00 | 1 Sony VAIO | Laptop | 600.00 | 1 HP Elite | Laptop | 900.00 | 3 Lenovo Thinkpad | Laptop | 1100.00 | 4 Microsoft Lumia | Smartphone | 300.00 | 1 HTC One | Smartphone | 400.00 | 2 Nexus | Smartphone | 500.00 | 3 iPhone | Smartphone | 800.00 | 4 Kindle Fire | Tablet | 300.00 | 1 iPad | Tablet | 700.00 | 2 Samsung Galaxy Tab | Tablet | 800.00 | 3 (11 rows) |
In the laptop product group, both Dell Vostro
and Sony VAIO
products have the same price, therefore, they receive the same rank 1. The next row in the group is HP Elite
that receives the rank 3 because the rank 2 is skipped.
Similar to the RANK()
function, the DENSE_RANK()
function assigns the ranking within an ordered partition, but the ranks are consecutive. In other words, the same ranks are assigned to multiple rows and no ranks are skipped.
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT product_name, group_name, price, DENSE_RANK () OVER ( PARTITION BY group_name ORDER BY price ) FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | dense_rank --------------------+------------+---------+------------ Dell Vostro | Laptop | 600.00 | 1 Sony VAIO | Laptop | 600.00 | 1 HP Elite | Laptop | 900.00 | 2 Lenovo Thinkpad | Laptop | 1100.00 | 3 Microsoft Lumia | Smartphone | 300.00 | 1 HTC One | Smartphone | 400.00 | 2 Nexus | Smartphone | 500.00 | 3 iPhone | Smartphone | 800.00 | 4 Kindle Fire | Tablet | 300.00 | 1 iPad | Tablet | 700.00 | 2 Samsung Galaxy Tab | Tablet | 800.00 | 3 (11 rows) |
Within the laptop product group, the rank 1 is assigned twice to Dell Vostro
and Sony VAIO
. The next rank is 2 assigned to HP Elite
.
FIRST_VALUE and LAST_VALUE functions
The FIRST_VALUE()
function returns the first value from the first row of the ordered set, whereas the LAST_VALUE()
function returns the last value from the last row of the result set.
The following statement uses the FIRST_VALUE()
to return the lowest price per product group.
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT product_name, group_name, price, FIRST_VALUE (price) OVER ( PARTITION BY group_name ORDER BY price ) AS lowest_price_per_group FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | lowest_price_per_group --------------------+------------+---------+------------------------ Dell Vostro | Laptop | 600.00 | 600.00 Sony VAIO | Laptop | 600.00 | 600.00 HP Elite | Laptop | 900.00 | 600.00 Lenovo Thinkpad | Laptop | 1100.00 | 600.00 Microsoft Lumia | Smartphone | 300.00 | 300.00 HTC One | Smartphone | 400.00 | 300.00 Nexus | Smartphone | 500.00 | 300.00 iPhone | Smartphone | 800.00 | 300.00 Kindle Fire | Tablet | 300.00 | 300.00 iPad | Tablet | 700.00 | 300.00 Samsung Galaxy Tab | Tablet | 800.00 | 300.00 (11 rows) |
The following statement uses the LAST_VALUE()
function to return the highest price per product group.
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT product_name, group_name, price, LAST_VALUE (price) OVER ( PARTITION BY group_name ORDER BY price RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS highest_price_per_group FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | highest_price_per_group --------------------+------------+---------+------------------------- Dell Vostro | Laptop | 600.00 | 1100.00 Sony VAIO | Laptop | 600.00 | 1100.00 HP Elite | Laptop | 900.00 | 1100.00 Lenovo Thinkpad | Laptop | 1100.00 | 1100.00 Microsoft Lumia | Smartphone | 300.00 | 800.00 HTC One | Smartphone | 400.00 | 800.00 Nexus | Smartphone | 500.00 | 800.00 iPhone | Smartphone | 800.00 | 800.00 Kindle Fire | Tablet | 300.00 | 800.00 iPad | Tablet | 700.00 | 800.00 Samsung Galaxy Tab | Tablet | 800.00 | 800.00 (11 rows) |
Notice that we added the windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
because by default the windowing clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
LAG and LEAD functions
The LAG()
function has the ability to access data from the previous row, while the LEAD()
function can access data from the next row.
Both LAG()
and LEAD()
functions have the same syntax as shown below:
1 2 | LAG (expression [,offset] [,default]) LEAD (expression [,offset] [,default]) |
-
expression
– a column or expression to compute the returned value. -
offset
– the number of rows preceding (LAG
)/ following (LEAD
) the current row. The default value ofoffset
is 1. -
default
– the default returned value if the offset is outside the scope of the window. The default of the default isNULL
.
The following statement uses the LAG()
function to return the prices from the previous row and calculates the difference between the price of the current row and the previous row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT product_name, group_name, price, LAG (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS prev_price, price - LAG (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS cur_prev_diff FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | prev_price | price_diff --------------------+------------+---------+------------+------------ Dell Vostro | Laptop | 600.00 | | Sony VAIO | Laptop | 600.00 | 600.00 | 0.00 HP Elite | Laptop | 900.00 | 600.00 | 300.00 Lenovo Thinkpad | Laptop | 1100.00 | 900.00 | 200.00 Microsoft Lumia | Smartphone | 300.00 | | HTC One | Smartphone | 400.00 | 300.00 | 100.00 Nexus | Smartphone | 500.00 | 400.00 | 100.00 iPhone | Smartphone | 800.00 | 500.00 | 300.00 Kindle Fire | Tablet | 300.00 | | iPad | Tablet | 700.00 | 300.00 | 400.00 Samsung Galaxy Tab | Tablet | 800.00 | 700.00 | 100.00 (11 rows) |
The following statement uses the LEAD()
function to return the prices from the next row and calculates the difference between the price of the current row and the next row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT product_name, group_name, price, LEAD (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS next_price, price - LEAD (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS cur_next_diff FROM products INNER JOIN product_groups USING (group_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | product_name | group_name | price | next_price | cur_next_diff --------------------+------------+---------+------------+--------------- Dell Vostro | Laptop | 600.00 | 600.00 | 0.00 Sony VAIO | Laptop | 600.00 | 900.00 | -300.00 HP Elite | Laptop | 900.00 | 1100.00 | -200.00 Lenovo Thinkpad | Laptop | 1100.00 | | Microsoft Lumia | Smartphone | 300.00 | 400.00 | -100.00 HTC One | Smartphone | 400.00 | 500.00 | -100.00 Nexus | Smartphone | 500.00 | 800.00 | -300.00 iPhone | Smartphone | 800.00 | | Kindle Fire | Tablet | 300.00 | 700.00 | -400.00 iPad | Tablet | 700.00 | 800.00 | -100.00 Samsung Galaxy Tab | Tablet | 800.00 | | (11 rows) |
In this tutorial, we have introduced you to the PostgreSQL window function and shown you the example of using them for querying data.