C Cruft

C is well-designed. Its keywords are few yet comprehensive, and its syntax permits succint yet clear code. However C is far from flawless.

Original sin

Chief among C’s faults is the preprocessor, a second language introduced to rectify weaknesses in the original language. Preprocessor directives are a necessary evil; the best we can do is avoid them as much as possible.

No comment

The loss of the # character to the preprocessor is a minor blow, requiring C to define a new comment syntax.

Unconditionally evil

Avoid #ifdef and its ilk. Conditional compilation is the most egregious preprocessor evil, as it obfuscates code for humans and automated tools alike. Tweaking compilation is a task best left for the build system. For example, code specific to particular architectures should be placed in separate files; the build system should choose which is appropriate.

Bad macros

A close second is misuse of the #define directive. Never use macros to define functions. They can cause mysterious bugs due to side effects from insufficient insulation, and make the code harder to analyze by human or machine. Moreover, they are unnecessary now that C99 supports the inline keyword (in header files, simply declare the function to be static inline).

Good macros

There are times where textual substitution makes sense, and #define is appropriate. Kernighan and Pike’s example is:

#define NELEM(a) (sizeof(a)/sizeof(a[0]))

which returns the size of a static array, a macro so useful that there should have been a keyword or operator for this to begin with.

When used be sure a macro is truly warranted; in the above example, observe the quantity is computed at compile time, and no function could take its place.

For programming contests, save typing by defining macros for common tasks such as looping in the interval [0..N-1], and reading and writing integers.

Another application is debugging. We can access the source file name and line number via macros.

#define REP(x, n) for (int x = 0; x < n; x++)
#define REP1(x, n) for (int x = 1; x <= n; x++)
#define EXPECT(condition) \
  if (!(condition)) fprintf(stderr, "%s:%d: FAIL\n", __FILE__, __LINE__)

// Prints the quadratic residues (squares) for a given modulus.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
  enum { MIN = 2, MAX = 1 << 15 };
  if (2 != argc) return printf("Usage: %s NUMBER\n", *argv), 0;
  int m = atoi(argv[1]);

  // This next line is buggy. The && should be ||.
  if (m < MIN && m > MAX) return printf("Modulus out of range\n"), 0;

  // These lines will catch the bug when the argument is out of range.
  EXPECT(m >= MIN);
  EXPECT(m <= MAX);
  REP(i, m) printf(" %d", i * i % m);
  puts("");
  return 0;
}

Constants

Rather than #define integer constants, use enum, as const int is insufficient. Declaring i as const merely tells the compiler to report errors if code attempts to write to i directly. We can break the rules by casting away the const, but the results are undefined. For example, on Linux, gcc places a static const int in a read-only data segment: attempts to write to it cause segmentation faults.

If our code never touches i, can it ever change? A classic C riddle along the same lines asks if const volatile ever makes sense. Wikipedia has the answer. Unlike enum, the rules prohibit replacing a const variable with a true constant at compile time.

Even so, for other types, a const variable may be preferable to #define constants.

Package not included

Unfortunately C lacks Java-style packages, so C programmers must use include files. We should follow Rob Pike’s advice: include files should not include files. If there are dependencies, they should be mentioned in comments and it is the .c file’s duty to include them. Also, if we must have header guards, they should be the reverse of common practice: check if you can avoid including a file before including it, not while it’s being included. This saves the preprocessor from churning through thousands of lines.

Equals inequality

The choice to make = the assign operator was short-sighted. Beginners and experts alike confuse it with the equality test, so much so that some veterans write:

if (1 == i) {

instead of:

if (i == 1) {

so that if one = is accidentally omitted, compilation fails.

I would have chosen := as the assignment operator.

Wearing braces

If statements should require braces. I have been bitten by bugs of the form:

if (foo)
  bar();
  baz();

Namely, poor indentation and lack of braces meant that I thought baz() would only be called if the condition were true.

A similar problem arises with simple-minded macros: if bar() expanded to a series of statements, only the first executes conditionally.

For consistency we should insist braces for loops.

Unprecedented precendence

The binary operations &, |, &&, and || all have fairly weak precedence. This is expected for the logical variants, but unintuitive for the bitwise variants.

Failing through fallthrough

Missing break statments in switch statements can cause hard-to-find bugs. The default behaviour should have been to break before the next case, and require the programmer to write a special keyword if fallthrough is desired.

In it current form, I recommend writing something like // FALLTHROUGH when it is intended.

Undeserved promotion

Implicit casting is often more trouble than it is worth. While type promotion is convenient when printing integers as floats or vice versa, I cannot easily determine the types of the terms in an arithmetic expression. Usually I throw in explicit casts to ensure the code is doing what I want.

Not my type

Once you get the hang of it, it is fun to translate declarations like (void *)(*fun[])(void (*)(void)) (an array of pointers to functions returning a pointer to void, where each takes one argument that is a pointer to a function that takes no arguments and returns void), but it would have been better to have simpler notation. For starters, prefix should not be mixed with postfix.

Barely functional

The upwards funarg problem is nontrivial, but the downwards variant can be elegantly implemented in a compiler via trampolining. So why not make nested functions and anonymous functions part of standard C? Even standard C++ is getting lambda expressions!

Also, standard C has a subtle issue with function pointers: it’s up to the implementation to decide what happens when casting function pointers to or from void pointers. In particular, dynamically linked functions should not return function pointers. Luckily, on my computers, casting function pointers to and from void pointers behaves as one would expect, and elicits no warnings from GCC.

Overexposure

Functions are visible to other files by default, though can be restricted to file-scope with a keyword. It should be the other way around, namely opt-in, not opt-out, to encourage the programmer to minimize what is shown to the outside world.

Similarly, when linking, symbols have full visibility by default, and compiler-specific directives are needed to hide them. The default should be to hide symbols and have the programmer explicitly designate those that are suitable for public viewing via a keyword.

Lack of namespaces hurts larger projects and libraries. To avoid collisions, one is forced to pick unwieldy names. For example, every function in the SDL library is prefixed with "SDL_". Approximating namespaces by defining static inline functions to abbreviate such names is a chore.


Ben Lynn blynn@cs.stanford.edu