Chapter 4. tr

The humble tr tool is surprisingly handy. It readily disposes of many little tasks:

  • conversion of newlines from one operating system to another
  • subsitution ciphers
  • extraction of, say, alphabetic characters from a file
  • changing lowercase to uppercase or vice versa
  • replacing consecutive spaces with a single space

Let’s look at a simplified tr, which only translates (it cannot delete nor squeeze) and only supports set of single characters (no ranges, escapes, classes).

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>

void die(const char *err, ...) {
  va_list params;
  va_start(params, err);
  vfprintf(stderr, err, params);
  fputc('\n', stderr);
  exit(1);
  va_end(params);
}
int main(int argc, char **argv) {
  if (argc < 2) die("tr: missing operand");
  if (argc < 3) die("tr: missing operand after `%s'", argv[1]);
  if (argc > 3) die("tr: extra operand `%s'", argv[2]);
  char tab[256];
  for(int i=0; i<256; i++) tab[i] = i;
  char *q = argv[2];
  for(char *p = argv[1]; *p; p++) {
    tab[(unsigned int)*p] = *q;
    if (*(q+1)) q++;
  }
  int c;
  while(EOF != (c = getchar())) {
    if (EOF == putchar(tab[c])) perror("tr"), exit(1);
  }
  if (ferror(stdin)) perror("tr"), exit(1);
  return 0;
}

UTF-8

This time, instead of moving to a Go program that behaves identically, we take advantage of Go’s features to make our program more versatile. Our Go version supports UTF-8, despite resembling the C original.

We use a map instead of an array, because there are much more than 256 Unicode characters. Go thankfully provides a built-in map type; in C, we’d have to supply our own.

package main
import("bufio";"os";"fmt";"flag")
func die(s string, v... interface{}) {
  fmt.Fprintf(os.Stderr, "tu: ");
  fmt.Fprintf(os.Stderr, s, v...);
  fmt.Fprintf(os.Stderr, "\n");
  os.Exit(1)
}
func main() {
  flag.Parse()
  if 1 > flag.NArg() { die("missing operand"); }
  if 2 > flag.NArg() { die("missing operand after `%s'", flag.Arg(0)); }
  if 2 < flag.NArg() { die("extra operand after `%s'", flag.Arg(1)); }
  tab := make(map[int]int)
  set1 := []int(flag.Arg(0))
  set2 := []int(flag.Arg(1))
  j := 0
  for i := 0; i < len(set1); i++ {
    tab[set1[i]] = set2[j]
    if j < len(set2) - 1 { j++ }
  }
  in := bufio.NewReader(os.Stdin)
  out := bufio.NewWriter(os.Stdout)
  flush := func() {
    if er := out.Flush(); er != nil { die("flush: %s", er.String()) }
  }
  writeRune := func(r int) {
    if _, er := out.WriteRune(r); er != nil { die("write: %s", er.String()) }
  }
  for done := false; !done; {
    switch r,_,er := in.ReadRune(); er {
    case os.EOF: done = true
    case nil:
      if s,found := tab[r]; found {
        writeRune(s)
      } else {
        writeRune(r)
      }
      if '\n' == r { flush() }
    default: die("%s: %s", os.Stdin.Name(), er.String())
    }
  }
  flush()
}

Then if the binary is named tu:

$ tu 0123456789 〇一二三四五六七八九 <<< 31415
三一四一五

Full translation

A complete tr utility takes a bit more work. For a classic version, we can get by with manipulating arrays of size 256. For a Unicode-aware version, complications arise with set complements and ranges.