Think In SQL – Window Function Demisify

Many SQL tutorials mark the window function as the advanced topic. Numerous job interviews on SQL often involve at least one question on window function. Understanding window function is critical for anyone that writes SQL daily (as well as SQL like languages like Spark Dataframe and Pandas). In this story, let’s think in SQL and demystify the window function with examples and diagrams.

What is Window Function?

Many versions of window function definitions are described by the community or the official DB vendors (MySQL, Postgresql). To be clear, I describe the window function in the following way:

Window Function enables current row to oversee a range of rows, applies function on those rows, and returns result back to the current row.

However, reading the definition will only resonate on how to think and use in SQL. Let’s put the definition aside. We’d need concrete examples and diagrams to explain how the window function works.

Window Function Example Setup

To explain how window function works, we don’t need fancy joins or complex conditional filters. All we need to do is a few rows that allow us to slide up and down 😃

I will use DuckDb for the illustration since DuckDb doesn’t require many dependencies, and you can get a SQL interface immediately.

We can create a dummy table with three fields: key, value, and date. Then export the table into a CSV file, then read the table by DuckDb CLI.

				
					import pandas as pd

data = {"key": [1, 1, 1, 2, 2, 3, 3, 4],
        "value": [100, 200, 300, 100, 200, 200, 300, 200],
        "date": [1, 2, 3, 1, 2, 2, 1, 1]
        }

window_function_demo_df = pd.DataFrame.from_dict(data)
window_function_demo_df.to_csv('~/Downloads/window_function_demo.csv')

Then launch DuckDb CLI then read the table using the magic. read_csv_auto

				
					./duckdb

--once in the DuckDb CLI, create a table first
CREATE TABLE window_function_demo AS SELECT * FROM read_csv_auto ('~/Downloads/window_function_demo.csv');

--select all columns from the newly created table
SELECT * FROM window_function_demo;

The Components of the Window Function

A window function can be broken down into three main components:

Partition By
Order By
Range of the Window Function

Partition By

The “Partition By” clause is for logical grouping. Those keys segregate current rows from the other rows in a logical partition, thus further limiting the window function’s range.

In our example, we have four distinct values under the key. When we apply window function, each value is going to be treated separately. The partition key provides the logical boundary avoid taking values from same window range but different keys. When the cursor reaches the end of a given partition, it will stop instead of jumping into the next partition.

If we have the following window function to sum the field value partition by field key

				
					SELECT key, value, date, SUM(value) Over (PARTITION BY key) AS total
FROM window_function_demo;

The following result is expected that assign each row with the sum of value for a given partition. Key 1 gets 100+200+300=600, Key 2 gets 100+200=300, Key 3 gets 200 + 300 = 500, and Key 4 gets 200.

Order By

Some window functions are sensitive to the sorting order; for example, ROW_NUMBER() and RANK(). Other aggregation functions are less concerned about the exact ordering; for instance, COUNT() and SUM() (if ROW or Range is provided, it will change and will cover in the next section. ).

If the window function relies on the ordering, it’s essential to write the sorting order down explicitly; Otherwise, the returned result isn’t idempotent (every time you rerun the query, you achieve the same result), and it leads to undesired effects.

In our example, if we’d like to return the row number for each partition and order by date in ascending order, for key = ‘3’ we’d need to reorder to get the row number returned correctly.

To put SQL down, we have

				
					SELECT key, value, date, ROW_NUMBER() Over (PARTITION BY key ORDER BY "Date") AS row_number
FROM window_function_demo;

If we look closer to key 3, the date is reordered, and row_number for each row assigned is correct by ascending order.

Range of the Window Function

Partition key provides the logical separation by each key. If we also want to change the range of the window for partition, we can achieve this by specifying PRECEDING or FOLLOWING .

PRECEDING defines the number of rows before the current row to included in the window function. If the supplied number is less than the specified one, it takes the maximum it can get. For example, if we have 1 PRECEDING , it includes one row before. If the current row is the first one without preceding row, the preceding row will be ignored (if we have value from the start to end, the sum will treat those missing value as 0 instead of NULL).

				
					SELECT key, value, date, 
SUM(value) Over (PARTITION BY key ORDER BY "date" DESC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS row_number
FROM window_function_demo;

Let’s take a look closer to key = 1. Step 1. since we don’t have the preceding 1 row, it treat the missing row as 0 and assign the current row to 100 (0+100). Step 2 and 3 are similar, and we now have the preceding 1 row. We can take that value and add it to the current value.

FOLLOWING is the same idea as PRECEDING, but look for the rows after the current row to include in the window function.

Types of Window Functions

Generally, window functions can be classified into two types: aggregate types, and position-based type.

Aggregate Types

The core idea of the window function is to enhance the current row from aggregated rows. All the aggregate functions can be used as part of the window function. However, not every DBMS have equality adoption to the aggregate function. You can still accomplish this by getting some aggregate function by creating a aggregated temporary table or view, then joining with each row. But it has the limitation that window range is only applicable to all unbounded PRECEDING or FOLLOWING rows

Position-Based Type

Those functions describe the position information of the current row within a given partition (i.e., rank, ntile) or get the value for a given position (lag, lead). Those functions are handy to learn about the relevant position of the current row. Many hard SQL interview questions require you to be savvy on various the position-based type window function.

Frequent Use Cases for Window Function

Remove duplication. I found many times distinct isn’t practical to perform deduplicate since it works only on all selected rows. If you want to remove duplication by a set of keys, we can write a CTE (common table expression) and use theROW_NUMBER function. We can deduplicate PARTITION BY with the desired deduplication order, then filter out the row number that isn’t equal to 1.

				
					WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY key Order BY "date") rn
FROM my_dataset
)

SELECT *
FROM CTE
WHERE rn = 1

Get Top N for each group. We are often interested in only the top few rows within a partition. For example, “Give a list of top 3 salespeople for each state”

				
					WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY state Order BY revenue DESC) rn
FROM sales
)

SELECT sales_person, state
FROM CTE
WHERE rn <= 3

Final Thoughts

Being Proficiency in SQL window function is a critical step to becoming fluent in SQL. This story is helpful to demystify SQL window function, and provide you a step-by-step guide and sample code on how to think window function clearly.

Thinking in SQL is critical. You can reason why things happen like this and quickly resolve the errors. Additionally, it helps you better structure your code for solving the complex queries in a more organized way.

About Me

I hope my stories are helpful to you.

For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.

In case of questions/comments, do not hesitate to write in the comments of this story or reach me directly through Linkedin or Twitter.

The Ultimate Apache Spark Guide: Performance Tuning, PySpark Examples, and New 4.0 Features

Chengzhi Zhao June 30, 2025

The ultimate guide to Apache Spark. Learn performance tuning with PySpark examples, fix common issues like data skew, and explore new Spark 4.0 features.

2025 Retrospective: How AI Changed the Way I Engineer

Chengzhi Zhao December 24, 2025

2025 marked the shift from experimenting with AI to relying on it. In this retrospective, I explore how AI killed the ‘tedious task’ but failed the ‘context test’—specifically sharing why OpenAI, Claude, and Gemini all couldn’t fix a complex protoc dependency that still required a human engineer.

Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers

Chengzhi Zhao May 5, 2023

Let’s bring the data community’s attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers.