
Many SQL tutorials mark the window function as the advanced topic. Numerous job interviews on SQL often involve at least one question on window function. Understanding window function is critical for anyone that writes SQL daily (as well as SQL like languages like Spark Dataframe and Pandas). In this story, let’s think in SQL and demystify the window function with examples and diagrams.
What is Window Function?
Many versions of window function definitions are described by the community or the official DB vendors (MySQL, Postgresql). To be clear, I describe the window function in the following way:
Window Function enables current row to oversee a range of rows, applies function on those rows, and returns result back to the current row.
However, reading the definition will only resonate on how to think and use in SQL. Let’s put the definition aside. We’d need concrete examples and diagrams to explain how the window function works.
Window Function Example Setup
To explain how window function works, we don’t need fancy joins or complex conditional filters. All we need to do is a few rows that allow us to slide up and down 😃
I will use DuckDb for the illustration since DuckDb doesn’t require many dependencies, and you can get a SQL interface immediately.
We can create a dummy table with three fields: key, value, and date. Then export the table into a CSV file, then read the table by DuckDb CLI.
import pandas as pd
data = {"key": [1, 1, 1, 2, 2, 3, 3, 4],
"value": [100, 200, 300, 100, 200, 200, 300, 200],
"date": [1, 2, 3, 1, 2, 2, 1, 1]
}
window_function_demo_df = pd.DataFrame.from_dict(data)
window_function_demo_df.to_csv('~/Downloads/window_function_demo.csv')
Then launch DuckDb CLI then read the table using the magic. read_csv_auto
./duckdb
--once in the DuckDb CLI, create a table first
CREATE TABLE window_function_demo AS SELECT * FROM read_csv_auto ('~/Downloads/window_function_demo.csv');
--select all columns from the newly created table
SELECT * FROM window_function_demo;

The Components of the Window Function

A window function can be broken down into three main components:
- Partition By
- Order By
- Range of the Window Function
Partition By
The “Partition By” clause is for logical grouping. Those keys segregate current rows from the other rows in a logical partition, thus further limiting the window function’s range.
In our example, we have four distinct values under the key. When we apply window function, each value is going to be treated separately. The partition key provides the logical boundary avoid taking values from same window range but different keys. When the cursor reaches the end of a given partition, it will stop instead of jumping into the next partition.

If we have the following window function to sum the field value
partition by field key
SELECT key, value, date, SUM(value) Over (PARTITION BY key) AS total
FROM window_function_demo;
The following result is expected that assign each row with the sum of value
for a given partition. Key 1 gets 100+200+300=600, Key 2 gets 100+200=300, Key 3 gets 200 + 300 = 500, and Key 4 gets 200.

Order By
Some window functions are sensitive to the sorting order; for example, ROW_NUMBER()
and RANK()
. Other aggregation functions are less concerned about the exact ordering; for instance, COUNT()
and SUM()
(if ROW or Range is provided, it will change and will cover in the next section. ).
If the window function relies on the ordering, it’s essential to write the sorting order down explicitly; Otherwise, the returned result isn’t idempotent (every time you rerun the query, you achieve the same result), and it leads to undesired effects.
In our example, if we’d like to return the row number for each partition and order by date in ascending order, for key = ‘3’
we’d need to reorder to get the row number returned correctly.

To put SQL down, we have
SELECT key, value, date, ROW_NUMBER() Over (PARTITION BY key ORDER BY "Date") AS row_number
FROM window_function_demo;

If we look closer to key 3, the date is reordered, and row_number for each row assigned is correct by ascending order.
Range of the Window Function
Partition key provides the logical separation by each key. If we also want to change the range of the window for partition, we can achieve this by specifying PRECEDING
or FOLLOWING
.
PRECEDING
defines the number of rows before the current row to included in the window function. If the supplied number is less than the specified one, it takes the maximum it can get. For example, if we have1 PRECEDING
, it includes one row before. If the current row is the first one without preceding row, the preceding row will be ignored (if we have value from the start to end, the sum will treat those missing value as 0 instead of NULL).
SELECT key, value, date,
SUM(value) Over (PARTITION BY key ORDER BY "date" DESC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS row_number
FROM window_function_demo;
Let’s take a look closer to key = 1. Step 1. since we don’t have the preceding 1 row, it treat the missing row as 0 and assign the current row to 100 (0+100). Step 2 and 3 are similar, and we now have the preceding 1 row. We can take that value and add it to the current value.


FOLLOWING
is the same idea asPRECEDING
, but look for the rows after the current row to include in the window function.
Types of Window Functions
Generally, window functions can be classified into two types: aggregate types, and position-based type.
Aggregate Types
The core idea of the window function is to enhance the current row from aggregated rows. All the aggregate functions can be used as part of the window function. However, not every DBMS have equality adoption to the aggregate function. You can still accomplish this by getting some aggregate function by creating a aggregated temporary table or view, then joining with each row. But it has the limitation that window range is only applicable to all unbounded PRECEDING
or FOLLOWING
rows
Position-Based Type
Those functions describe the position information of the current row within a given partition (i.e., rank, ntile) or get the value for a given position (lag, lead). Those functions are handy to learn about the relevant position of the current row. Many hard SQL interview questions require you to be savvy on various the position-based type window function.
Frequent Use Cases for Window Function
- Remove duplication. I found many times
distinct
isn’t practical to perform deduplicate since it works only on all selected rows. If you want to remove duplication by a set of keys, we can write a CTE (common table expression) and use theROW_NUMBER
function. We can deduplicatePARTITION BY
with the desired deduplication order, then filter out the row number that isn’t equal to 1.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY key Order BY "date") rn
FROM my_dataset
)
SELECT *
FROM CTE
WHERE rn = 1
- Get Top N for each group. We are often interested in only the top few rows within a partition. For example, “Give a list of top 3 salespeople for each state”
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY state Order BY revenue DESC) rn
FROM sales
)
SELECT sales_person, state
FROM CTE
WHERE rn <= 3
Final Thoughts
Being Proficiency in SQL window function is a critical step to becoming fluent in SQL. This story is helpful to demystify SQL window function, and provide you a step-by-step guide and sample code on how to think window function clearly.
Thinking in SQL is critical. You can reason why things happen like this and quickly resolve the errors. Additionally, it helps you better structure your code for solving the complex queries in a more organized way.
About Me
I hope my stories are helpful to you.
For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.
More Articles

How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny
Using R and Shiny, we can build an app where the end users can interact with the data analysis we have done. I will show you how to engage with users by storytelling - show data analytics in R and Shiny.

How I Found Peace of Mind After Timeboxing
I am tired of continuous rapid context switching and keep being distracted. Interruption kept occurring and continued the next day. Timeboxing helped me find peace of mind in an isolated environment to concentrate on my task.

Streaming Data Is Exciting: What You Need to Know Before Taking the Plunge
Is streaming data necessary for this particular use case? Rather than blindly diving in, it’s essential first to acknowledge the realities of working with streaming data.