The Behavior of NULLs in SQL


Rarely is the full behavior of the NULL value in SQL taught or described in detail, and with good reason: Some of the SQL rules surrounding NULL can be surprising or unintuitive.

Unfortunately, if you have deal with NULL in real databases, the results can be downright frustrating. The SQLite project, for example, uses trial and error to determine how a database behaves in the presence of NULL values.

Fortunately, Date and Darwen's A Guide to the SQL Standard (fourth edition) [1] describes SQL's rules concerning NULL in good detail.

NULL Basics

Intuitively, NULL approximately represents an unknown value.

A Simple Case

Here is what the SQL standard mandates for some operations involving sets and multisets.

For a simple relation R

CREATE TABLE R (a INTEGER);
the following queries attempt to reliably determine the maximum known value of the attribute a in the table R.

For brevity below, let

MAX()

This is the obvious query:

SELECT MAX(a) FROM R

a >= ALL()

This expression of the maximum seems consistent with mathematical logic, but fails completely in SQL:

SELECT DISTINCT a
FROM R
WHERE a >= ALL (SELECT * FROM R)

EXCEPT

This expression is one derivation of maximum as computed in relational algebra: subtract all the non-maximum values from the table R, leaving the maximal ones:

(SELECT DISTINCT * FROM R)
 EXCEPT
(SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a)

NOT IN

This expression is another writing of maximum as computed in relational algebra: find values not in the non-maximum values of R:

SELECT DISTINCT *
FROM R
WHERE a NOT IN (SELECT R.a
                FROM R, R AS S
                WHERE R.a < S.a)
This writing turns out to be subtly different from the last one.

EXCEPT NULL

Because it is somewhat awkward to have an expression for MAX return two rows whose values do not equal, the following expression adjusts the EXCEPT expression to exclude NULL from the answer:

(SELECT DISTINCT * FROM R)
 EXCEPT
(SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a OR R.a IS NULL)

Database Implementation Compliance

PostgreSQL 7.2.2

Where SQL mandates a behavior for a query above, PostgreSQL complies.

Oracle 9i

Where SQL mandates a behavior for a query above, Oracle complies.

Conclusion

No pair of the queries from the list above are equivalent when faced with NULLs in relational data, despite their conceptual similarity.

The two implementations tested, PostgreSQL and Oracle, seem to comply with NULL behavior for the set and multiset operations tested here, even when such behavior is sometimes subtle or unintuitive.

Consider the above a good reason to define away NULLs from relational schema whenever possible.


Footnotes for this page

[1] C. J. Date and Hugh Darwen A Guide to the SQL Standard. Fourth edition, Addison-Wesley, Reading, Massachusetts, 1997. (ISBN 0-201-96426-0)

[2] page 236.

[3] page 239.

[4] page 236-237.

[5] page 237.

[6] page 176.

[7] page 244-245.

[8] page 176.

[9] page 244.

[10] page 249.


Wang Lam - source Thu May 1 02:31:00 2003 - generated Thu Aug 5 02:57:14 2004