Search⌘ K
AI Features

Comparing Missing Values

Understand how to properly compare and handle NULL values in SQL for effective data cleaning. Learn why the equality operator does not work with NULLs and how to use IS, IS DISTINCT FROM, and IS NOT DISTINCT FROM to compare missing values safely. This lesson helps you prepare your data accurately for analysis by mastering missing value comparisons.

Overview

Before we start analyzing data, we usually need to clean it first. Data cleaning is an important part of this process, and handling missing values is a large part of that.

Missing data in SQL

Missing data in SQL is represented by the special value NULL. We already know that some functions, such as aggregate functions, handle NULL values differently. For example:

PostgreSQL
SELECT
COUNT(*) AS count_rows,
COUNT(value) AS count_values
FROM (VALUES
(1),
(null)
) AS t(value);

As you can see, the expression COUNT(value) ignored the NULL value in the column value, and the result was 1.

Comparing NULL values

We can't use the equality comparison operator, =, to check if a certain value is NULL. To check if a value is NULL, we need to use the IS operator instead:

PostgreSQL
SELECT
NULL = NULL AS equal,
NULL IS NULL AS is
;

When we use the equality operator in the expression NULL = NULL, the result is not True, it is NULL. When we use the IS operator, the result of the expression NULL IS NULL is True.

Using the IS DISTINCT FROM operator

Comparing nullable values and columns can be tricky. If we use the equality operator we may get NULL as a result. To safely compare NULL values for equality, SQL offers the IS DISTINCT FROM operator:

PostgreSQL
SELECT
a,
b,
a = b as equal,
a IS DISTINCT FROM b AS is_distinct_from
FROM (VALUES
(1, 1),
(1, 2),
(1, NULL),
(NULL, NULL)
) AS t(a, b);

The query illustrates the differences between the equality operator and IS DISTINCT FROM. When a and b are not NULL, the equality and IS DISTINCT FROM are returning the same result. However, when one or both values are NULL, the equality operator evaluates to NULL, but IS DISTINCT FROM returns the result as if we used IS.

Note: When comparing nullable columns, it's safer to use IS DISTINCT FROM or IS NOT DISTINCT FROM.

The IS DISTINCT FROM operator has an opposite operator IS NOT DISTINCT FROM:

PostgreSQL
SELECT
a,
b,
a IS NOT DISTINCT FROM b as is_not_distinct_from,
a IS DISTINCT FROM b AS is_distinct_from
FROM (VALUES
(1, 1),
(1, 2),
(1, NULL),
(NULL, NULL)
) AS t(a, b);

The IS NOT DISTINCT FROM operator is the opposite of IS DISTINCT FROM, or != for non-nullable values.