The Behavior of NULLs in SQL

NULL Basics
MAX()
a >= ALL()
EXCEPT
NOT IN
EXCEPT NULL
Database Implementation Compliance
Conclusion

Rarely is the full behavior of the NULL value in SQL taught or described in detail, and with good reason: Some of the SQL rules surrounding NULL can be surprising or unintuitive.

Unfortunately, if you have deal with NULL in real databases, the results can be downright frustrating. The SQLite project, for example, uses trial and error to determine how a database behaves in the presence of NULL values.

Fortunately, Date and Darwen's A Guide to the SQL Standard (fourth edition) [1] describes SQL's rules concerning NULL in good detail.

NULL Basics

Intuitively, NULL approximately represents an unknown value.

An arithmetic operation involving a NULL returns NULL. For example, NULL minus NULL yields NULL, not zero. [2]
A boolean comparison between two values involving a NULL returns neither true nor false, but unknown in SQL's three-valued logic. [3] For example, neither NULL equals NULL nor NULL not-equals NULL is true. Testing whether a value is NULL requires an expression such as IS NULL or IS NOT NULL.
An SQL query selects only values whose WHERE expression evaluates to true, and groups whose HAVING clause evaluates to true.
The aggregate COUNT(*) counts all NULL and non-NULL tuples; COUNT(attribute) counts all tuples whose attribute value is not NULL. Other SQL aggregate functions ignore NULL values in their computation. [4]

A Simple Case

Here is what the SQL standard mandates for some operations involving sets and multisets.

For a simple relation R

CREATE TABLE R (a INTEGER);

the following queries attempt to reliably determine the maximum known value of the attribute a in the table R.

For brevity below, let

empty refer to an empty result of no rows, and
NULL (in the context of a table or result) refer to a table with one row holding a NULL value.

MAX()

This is the obvious query:

SELECT MAX(a) FROM R

If R is empty, the query must return NULL, not empty. [5]
If R is a one-row table holding NULL, Date and Darwen don't specifically declare what MAX() should return if its argument consists of exclusively NULL values.
If R is a table holding NULL and non-NULL integers, NULLs are ignored, and MAX() returns the maximum integer.

a >= ALL()

This expression of the maximum seems consistent with mathematical logic, but fails completely in SQL:

SELECT DISTINCT a
FROM R
WHERE a >= ALL (SELECT * FROM R)

If R is empty, the query returns empty. The >= ALL test is vacuously true with an empty subquery [6], but there is no value of a to exploit the test.
If R holds a NULL value, the query returns empty, because the test a >= ALL(...) returns unknown (not false!) for any NULL or maximum non-NULL integer value of a if the subquery includes a NULL value. [7]

EXCEPT

This expression is one derivation of maximum as computed in relational algebra: subtract all the non-maximum values from the table R, leaving the maximal ones:

(SELECT DISTINCT * FROM R)
 EXCEPT
(SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a)

If R is empty, the query returns empty.
If R holds a NULL value, the query returns NULL, in addition to whatever maximal integer is present (if any). The lower subquery never includes NULL, so NULL is never subtracted from R.

NOT IN

This expression is another writing of maximum as computed in relational algebra: find values not in the non-maximum values of R:

SELECT DISTINCT *
FROM R
WHERE a NOT IN (SELECT R.a
                FROM R, R AS S
                WHERE R.a < S.a)

This writing turns out to be subtly different from the last one.

If R is empty, the query returns empty.
If R holds one integer, and at least one NULL value, the query returns NULL, in addition to whatever maximal integer is present (if any). In this case, the subquery is always empty; the one available integer is compared only to NULL values, so do not participate in the subquery's result. NULL NOT IN (empty) is vacuously true as it is in mathematics [8], so NULL is selected as part of the result.
If R holds more than one integer, the query returns the maximal integer. In this case, the subquery includes at least one value. Now that the subquery is not empty, NULL NOT IN (nonempty result) evaluates to unknown (not false!), and is no longer selected as part of the result. As an aside, NULL NOT IN (nonempty result) returns unknown even if the nonempty result includes NULL. [9]

EXCEPT NULL

Because it is somewhat awkward to have an expression for MAX return two rows whose values do not equal, the following expression adjusts the EXCEPT expression to exclude NULL from the answer:

(SELECT DISTINCT * FROM R)
 EXCEPT
(SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a OR R.a IS NULL)

If R is empty, the query returns empty.
If R holds NULL, the query returns the maximal integer, or empty if R has no integers. EXCEPT will remove NULL from the result if NULL appears in the bottom subquery, even though NULL is not equal to NULL. [10] Similarly, DISTINCT, UNION, and INTERSECT always returns at most one NULL.

Database Implementation Compliance

PostgreSQL 7.2.2

Where SQL mandates a behavior for a query above, PostgreSQL complies.

MAX() of NULLs only: returns NULL (consistent with ignoring NULLs, then computing the MAX() of an empty remainder).

Oracle 9i

Where SQL mandates a behavior for a query above, Oracle complies.

MAX() of NULLs only: returns NULL (consistent with ignoring NULLs, then computing the MAX() of an empty remainder).

Conclusion

No pair of the queries from the list above are equivalent when faced with NULLs in relational data, despite their conceptual similarity.

The two implementations tested, PostgreSQL and Oracle, seem to comply with NULL behavior for the set and multiset operations tested here, even when such behavior is sometimes subtle or unintuitive.

Consider the above a good reason to define away NULLs from relational schema whenever possible.

Footnotes for this page

[1] C. J. Date and Hugh Darwen A Guide to the SQL Standard. Fourth edition, Addison-Wesley, Reading, Massachusetts, 1997. (ISBN 0-201-96426-0)

[2] page 236.

[3] page 239.

[4] page 236-237.

[5] page 237.

[6] page 176.

[7] page 244-245.

[8] page 176.

[9] page 244.

[10] page 249.

Wang Lam - source Thu May 1 02:31:00 2003 - generated Thu Aug 5 02:57:14 2004