Sound Synthesis

We survey the ingredients of sound synthesis.

Oscillators

  • Sinusoids: Building blocks of periodic signals; think Fourier series. They sound bare by themselves, and rarely exist alone in nature.

  • Square Waves: All odd harmonics. A good starting point for subtractive synthesis.

  • Sawtooth Waves: All harmonics. Another good starting point for subtractive synthesis.

  • Triangle Waves: All odd harmonics, like square waves, but they roll off much faster (hence they are closer to a sine wave).

  • Pulse Waves: These are square waves, except one can vary the time spent at the maximum value per cycle. The ratio of the time spent at the maximum to the time spent at the minimum is called the duty cycle. For example, a square wave is a pulse wave with duty cycle 0.5. If the duty cycle is \(1/n\), then the spectrum of the the pulse wave is the same as that of a sawtooth wave except that every \(n\)th harmonic is missing. Other signals may be modified in the same fashion.

Modulation

We can vary one parameter of a sound according to another signal.

Amplitude Modulation

Slowly varying the amplitude of a sound is called tremolo. The spectrum of the product of two signals together is the convolution of the spectrums of the signals. In particular, if two sine waves are multiplied, we end up with sine waves whose frequencies are the sum and difference of the input frequencies.

In the analogue world, circuits that performed multiplication correctly even when one or both of the signals are negative ("four-quadrant multiplication") are not as trivial as a voltage-controlled amplifier, and are called ring modulators (because one such implementation uses a ring of four diodes). In the digital world, the obvious way to design a amplitude modulator leads to a ring modulator.

[I’m not sure what happens when multiplication goes wrong. Perhaps one or both of the input signals leaks through.]

Filter Modulation

We can feed a sound through a low-pass filter and vary the filter cutoff according to another signal. At very low frequencies (below 1Hz) we can get interesting sounds that change very slowly. Higher up, we start to get a wah-wah effect. Higher still leads to a "growl" sound such as those in brass instruments.

Pulse Width Modulation

Varying the duty cycle on a pulse wave can be useful for synthesizing stringed instruments.

Frequency Modulation

Varying the frequency of an input wave slowly is called vibrato. At faster modulation frequencies, we have

\[ \sin(\alpha + a \sin \beta) = J_0(a) \sin \alpha + \sum_{k=1}^\infty J_k(a) ({ \sin(\alpha + k \beta) + (-1)^k \sin (\alpha - k \beta) }) \]

where \(J_k\) is the Bessel function of the first kind of order \(k\). The coefficient \(a\) is called the modulation index. Thus frequency modulation preserves the carrier frequency and introduces an infinite number of sideband frequencies, evenly spaced by the modulating frequency.

The modulation index \(a\) is not the peak frequency deviation. Just as the frequency of \(sin(\omega t)\) is given by \(\frac{d}{d t} \omega t = \omega\), we see that the instantaneous frequency of \(\sin(\omega_c t + a \sin \omega_m t)\) is

\[ \omega_c + \omega_m a \cos \omega_m t \] giving a peak deviation at \(\omega_m a\). In software synthesizers, a natural implementation of FM is to compute \[ f_{now} = f_c + b \sin(2 \pi f_m t) \] and \(f_{now}\) is used to determine the next oscillator value. When using them, to obtain a desired modulation index \(a\) the peak frequency should be set to \(b = a f_m\).

For \(k \gt 0\), we have \(J_k(x) = 0\). As \(x\) increases, \(J_k(x)\) stays close to zero for a while: the larger \(k\) is, the longer \(J_k(x)\) stays close to zero. Then \(J_k\) looks like a damped sinusoid: a sine wave whose peaks get closer to zero as \(x\) increases. Empirically, except for small \(k\), we find \(J_k(x)\) reaches its maximum roughly at \(x = a+2\). Another good rule of thumb is that the contributions of the frequencies above \(\alpha + 1.5 a \beta\) are insignificant.

This gives us a rough idea of what frequency modulation does. Increasing the modulation index introduces sidebands that are further away from the carrier frequency. At the same time, the amplitudes of closer sideband frequencies will oscillate.

Define the harmonicity ratio \(H\) by \(H = f_m / f_c\) where \(f_m, f_c\) are the modulating and carrier frequencies respectively. Then for \(H=1\), the sideband frequencies are simply the harmonics of the carrier frequency (which is the same as the modulating frequency) along with a DC component (from \(f_c - f_m\)). Note that the negative-frequency components will interfere with their corresponding positive ones. More generally, if \(H = 1/N\), then we have a harmonic waveform with the carrier frequency equal to the \(N\)th harmonic, while if \(H = N\) then we have a harmonic waveform with missing harmonics. Even more generally, if \(H\) is rational then the output waveform is harmonic. For irrational \(H\) the spectrum is inharmonic, which is useful for synthesizing "metallic" sounds, such as unpitched musical tones (e.g. bells, gongs, drums).

Chebyshev Polynomials

Consider the following equation, Chebyshev’s differential equation: \[ (1-x^2)y'' - x y' + n^2 y = 0 \] where \(n \in \mathbb{Z}_{\ge 0}\). Chebyshev polynomials of the first kind are solutions to this equation, and they are given by \[ T_n(x) = \cos (n \cos^{-1} x) = x^n - \binom{n}{2} x^{n-2} (1-x^2) + \binom{n}{4} x{n-4}(1-x2)^2 - …​ \] They can be computed recursively using the formula \[ T_{n+1}(x) = 2 x T_n(x) - T_{n-1}(x) \] The first few Chebyshev polynomials are \[ T_0(x) = 1, T_1(x) = x, T_2(x) = 2x^2 - 1, T_3(x) = 4x^3 - 3x \] They are useful in sound synthesis because \[ T_n(\cos \theta) = \cos(n \theta) \] (e.g. feeding a cosine wave through \(T_2\) will produce an cosine wave with double the frequency.)


Ben Lynn blynn@cs.stanford.edu 💡