
Sometimes writing SQL can be frustrating, especially when encountering NULL values. It’s common to feel frustrated and think, “This isn’t what I want.” I hope this article can help you better understand these tricky NULL in SQL.
If you write any SQL, you must have experience working with NULL. NULL was created by E. F. Codd, the relational database creator, to address the “missing information” case.
Null (or NULL) is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. — Wikipedia
Suppose you come across an online questionnaire that asks you to choose an occupation from a list of options. However, at that moment, you may not have a preferred occupation. In such a situation, you can simply fill in “N/A” as your answer, which will be recorded as NULL in the database to indicate that you did not provide an answer or that your answer was not listed. If you need to query people who did not provide a response, you can use the well-known syntax.
But, ignoring NULL values can have a negative impact on the accuracy of your data analysis. To demonstrate this, you can use http://sqlfiddle.com, a free online SQL editor, which supports MySQL 5.6. Please note that the syntax may vary depending on the database engine used.
Build a Schema To Demonstrate NULL
CREATE TABLE IF NOT EXISTS form (
id int(6) NULL,
occupation varchar(200) NULL
);
INSERT INTO form (id, occupation) VALUES ('1', 'Accountant');
INSERT INTO form (id, occupation) VALUES ('2', 'Software Engineer');
INSERT INTO form (id, occupation) VALUES ('3', NULL);
INSERT INTO form (id, occupation) VALUES ('4', NULL);
INSERT INTO form (id, occupation) VALUES (NULL, NULL);
To simplify everything, in this example, we will have a table called form
with only two columns: id
and occupation
, and occupation
It is a nullable column. We will insert four records into this table, the last two with occupation as null to represent unknown or missing values.
SQL NULL in Group By
Let’s run the following script and observe what happens with null
SELECT occupation, COUNT(1) AS cnt
FROM form
GROUP BY occupation

It is not uncommon to see NULL treated as a distinct category and displayed as (null). SQL can assist in grouping everything into this category, but we are still working to determine their values.
SQL NULL in Order By
SELECT occupation, COUNT(1) AS cnt
FROM form
GROUP BY occupation
ORDER BY occupation

Nothing changed. Since we chose MySQL, are NULLs considered lower than any non-NULL value? But the different implementations of DB can change this behavior. You can refer to Ordering result sets, which have good documentation on the expected behavior. Standard SQL can also use NULLS Last
, which is also implemented in DB like Oracle.
SQL NULL in Filtering
NULL in filtering conditions is error-prone. It is important to use caution when working with NULL values in where/having clause.
Case 1: find out who has an occupation in DB
SELECT id, occupation
FROM form
WHERE occupation IS NOT NULL

This is the real use case for null you’d need to use IS
to filter the nulls in this case. In the next case will, something could go south.
Case 2: find out which IDs are not Software Engineer as an occupation
SELECT id, occupation
FROM form
WHERE occupation <> 'Software Engineer'

In this case, the only accountant is returned as it is not a software engineer. But what about IDs 3 and 4? They are not in the list either as SQL use IS
operator for nulls, so it will be ignored whether in case of =
or <>
. You can try to change to occupation = ‘Software Engineer’
see the result.
Should you want to return the nulls or not is decided by your business logic. It is imperative always to be aware of nulls in your dataset while performing data analysis. If you keep nulls, you can add a new sense to show it.
WHERE occupation <> 'Software Engineer' or occupation is NULL
or convert nulls to non-nulls
WHERE COALESCE(occupation,'') <> 'Software Engineer'
SQL NULL in Concatenation
Concatenation is also a syntax that is likely to be error-prone with nulls. Take a look at this example
SELECT id, CONCAT(occupation, NULL) AS occupation
FROM form

Everything is NULL in the end. SQL cannot determine what you want to do when you concatenate a value with a null, so it’s safe to return null, as SQL has no idea how to handle it. A lot of us might want to think of null as an empty string ''
If you want that behavior, use COALESCE
or similar to handling those scenarios.
SQL NULL in Calculation
1+2+3+4+null
would be invalid in this case since we don’t know what value is invalid. However, if you’d want to aggregate value with null, most DB has different behavior of nulls with calculation changes.
SELECT SUM(id)
FROM form
We got 10 as the final result, so nulls are just ignored. If you apply AVG
, the result is 2.5, which is 10/4, and you can see nulls have been missed again. Whether you want to do SUM(id)/Count(1)
or AVG(id)
would be a different result, so you’d need to ensure your goal is the same as expected.
SQL NULL in Joins
When joining, it’s possible that null values may be the end result. While it’s important to apply business logic after the join, it’s also crucial to consider how nulls will affect downstream cases. Whether or not to keep nulls will depend on your specific use cases. If you need to join keys that include null values, this should be taken into account.
SELECT *
FROM form a
INNER JOIN form b on a.id = b.id

With a similar idea with dinner, you won’t see id 5 in this case, as SQL has no idea what to do with nulls. However, if you do left join, as SQL will keep the left records, it would show up as id 5.
Final Thought
In my opinion, null values can sometimes be seen as a shortcut – a way to avoid dealing with specific data points by using a catch-all term. For example, in Java programming, null values are commonly used, but this can lead to the risk of Null Pointer Exceptions. However, null values are a reality in SQL, and it is important to acknowledge and handle them appropriately when working with datasets.
About Me
I hope my stories are helpful to you.
For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.
More Articles
Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types
Slowly Changing Dimension (SCD) is critical to dimensional modeling. We will discuss the eight types of SCDs. By the end, you will clearly understand each type and be able to differentiate between SCDs in dimensional modeling.
5 Fantastic Data Pipeline Orchestration Tools For R
Many modern data orchestration projects like Apache Airflow and Luigi are Python-based. Let’s explore the popular data pipeline orchestration options for R.
Streaming Data Is Exciting: What You Need to Know Before Taking the Plunge
Is streaming data necessary for this particular use case? Rather than blindly diving in, it’s essential first to acknowledge the realities of working with streaming data.