Digital Audio and Music Technology [No.3 - Introduction to Fourier transform]

In this section, we will discuss a very important concept of digital audio processing: the Fourier transform.

Time domain and frequency domain

Until this section, we have been discussing audio in the time domain. The time domain describes the amplitude – time relationship, where the independent variable is time and the dependent variable is amplitude. A time domain diagram of a C key dominant chord (g1-b1-d1) looks like this:

The horizontal axis is time, in seconds. The vertical axis is the amplitude

Here, to simplify the situation for the sake of easy understanding, all three notes are sine waves. At this point, the waveform is no longer a simple sine wave, but the result of adding three tones together.

PCM format stores data in time domain. However, musicians think more about the pitches (frequencies) – afterall, music sheets are marks of frequencies. In order to do more operations, we need to first turn the time domain into frequency domain.

In 1807, Joseph Fourier made a rather important argument that any periodic function can be represented by a series of sine functions. Any function, as long as it is periodic, is the sum of a series of sinusoidal functions. This is the Fourier series. Of course, later research proved that the periodic function can only be expanded into a Fourier series if it satisfies the Dirichlet Conditions.

Dirichlet Conditions:

f must be absolutely integrable over a period.
f must be of bounded variation in any given bounded interval.
f must have a finite number of discontinuities in any given bounded interval, and the discontinuities cannot be infinite.

The frequency domain tells us the frequency component of the current waveform. The frequency domain diagram for this genus chord looks like this.

Now it’s pretty clear that the waveform contains a 391.995Hz sinewave with a amplitude of 0.4, a 493.883Hz sinewave with a amplitude of 0.4, and a 587.330Hz sinewave with a amplitude of 0.3.

Therefore, the funciton of the waveform above is:

$f(x)=0.3\sin(391.995\cdot2\pi x)+0.3\sin(493.883\cdot2\pi x)+0.4\sin(587.330\cdot2\pi x)$

To determine a sine function $y=A\sin (\omega x+\phi)+k$, we need to know its amplitude$A$, period$\omega$(frequency), phase offset$\phi$, and vertical offset$k$. The centre of the audio waveform is the x-axis, so k is always zero. The frequency domain diagram describes the frequency and amplitude of each sine wave.

Reading this, you may think that frequency domain diagrams are a very unfamiliar thing …… Actually, they are not. The following one is also a kind of frequency domain diagram, except that it is mainly for visual purposes.

Another example for Fourier series. A square wave looks nothing like a sine wave, but in fact, if we add a sine wave and its odd harmonics:

$\sin {x}+\frac {\sin3x}{3}+\frac {\sin5x}{5}+\frac {\sin7x}{7}+\frac {\sin9x}{9}+…$

Also write as:

$f(x)=\sum_{n=1}^{\infty }\frac{\sin((2n-1)x)}{(2n-1)}$

As the number of harmonics increases, the graph will look more and more like a square wave.

Using a gif from wikipedia:

That’s interesting, right?

Another interesting phenomenon: when a periodic function with discontinuous points (such as the square wave above with two jumps per cycle) is Fourier expanded and then synthesized using finite terms (e.g., the image above uses one hundred terms), a bulge appears near the discontinuity. As the number of levels increases, the size of the bulge approaches about 9% of the total jump value. This is known as Gibbs phenomenon.

Fourier transform

Now the question is, how to get the frequency domain distribution of the function from the time domain data of the function? In other words, considering the sample sequence as a discrete function f (x), how can we know the intensity of each frequency inside f(x)?

There is a more visual and intuitive explanation of the Fourier transform in an episode of 3b1b’s video; however, I would recommend deriving the principle of the Fourier transform from a perspective of function orthogonality. Don’t worry, you don’t need much advanced math knowledge, high school math level plus a little calculus knowledge is enough to understand.

Orthogonal functions

We define the dot product of two vectors $\vec {v}=(v_1,v_2)$, $\vec {u}=(u_1,u_2)$ as:

$\vec{v}\cdot \vec{u}=v_1u_1+v_2u_2$.

$\vec{v}\cdot \vec{u}=\left | \vec{v} \right | \left | \vec{u} \right | \cos\left ( \theta \right )$

The latter is the geometric meaning of the dot product. $\theta$ is the angle between two vectors. Let us disregard the geometric meaning and consider only the first expression.

You can see that dot product is actually multiplying each term of the vectors and then adding them together to get a scalar. The above is the case of two-dimensional vectors. But this can be done for any vectors of any dimension.

The dot product of vector $\vec {v}=(v_1,v_2, … ,v_n)$ and $\vec {u}=(u_1,u_2, … ,u_n)$ is:

$\vec{v}\cdot \vec{u}=v_1u_1+v_2u_2+…+v_nu_n$

Two vectors are orthogonal if they have a dot product of 0. Orthogonal vectors are perpendicular in the space. For example, $(0,3)$ and $(5,0)$ are perpendicular because $0\times3+5\times0=0$.

What follows may be a bit weird. We consider functions as vectors, and each point on the graph of a function as an item of the vector. In other words, every continuous function is a vector of infinite dimensions, and discrete functions are vectors of infinite or finite dimensions. For a continuous function $f(x)$ with a definition domain of $[a,b]$, we can define the following vectors:

$\vec{f}=\lim_{\Delta x \to 0} (f(a),f(a+\Delta x),f(a+2\Delta x),…,f(a+n\Delta x),…,f(b))$

When we consider a sample sequence as a discrete function $f(x)$, the $\Delta x$ will be the sampling period. In fact, arrays in computers are also vectors. You see, the dynamic array container in the C++ STL library is called <vector>.

Functions that are treated as vectors can also perform dot product operations, as long as they have the same domains. The operation is the same as for ordinary vectors: multiply the corresponding terms together and add them up. For functions $f(x)$ and $g(x)$ that both have the domain $[a,b]$, their dot product is:

$\lim_{\Delta x \to 0}f(a)g(a)+f(a+\Delta x)g(a+\Delta x)+f(a+2\Delta x)g(a+2\Delta x)+…+f(a+n\Delta x)g(a+n\Delta x)+…+f(b)g(b)$

In fact, it also means to find the integration as follows:

$\int_{a}^{b} f(x)g(x)\mathrm{d}x$

Therefore, if we say that real functions $f(x)$ and $g(x)$ with the same domain $[a,b]$ are orthogonal, then:

$\int_{a}^{b} f(x)g(x)\mathrm{d}x=0$

Orthogonal properties of trigonometric functions

The Fourier transform takes advantage of the orthogonal nature of the trigonometric family of functions.

First, the sine and cosine functions integrate over an integer period equal to 0. On the following image, the blue and green regions are apparently equal in area and therefore cancel each other out.

Therefore, $\int_{a}^{a+n\frac {2\pi}{\omega }} \sin {\omega t} \mathrm {d} t=0$ for all integer n. $\frac {2\pi}{\omega }$ is the period of the sine function.

Are sine functions $\sin m_1\omega t$ and $\sin m_2\omega t$ orthogonal in whole number periods? ($m_1$,$m_2$are positive integers. $m_1\ne m_2$) We already know that the dot product of both functions is:

$\int_{b}^{a} \sin m_1\omega t\sin m_2\omega t\mathrm{d}x$

According to the prosthaphaeresis formulas:

$\sin \alpha \sin \beta=-\frac{1}{2}(-2\sin \alpha \sin \beta)$

$=-\frac{1}{2}\left[\cos\alpha\cos\beta-\sin \alpha \sin \beta-\right(\cos\alpha\cos\beta+\sin \alpha \sin \beta)]$

$=-\frac{1}{2}\left[\cos(\alpha+\beta)-\cos(\alpha-\beta)\right]$

We can get that:

$\int_{b}^{a} \sin m_1\omega t\sin m_2\omega t\mathrm{d}x=\frac{1}{2}\left [ \int_{a}^{a+n\frac{2\pi}{\omega }} \cos{(m_1+m_2)\omega t} \mathrm{d}t +\int_{a}^{a+n\frac{2\pi}{\omega }} \cos{(m_1-m_2)\omega t} \mathrm{d}t \right ]$

Since $m_1$, $m_2$are positive integers, $(m_1+m_2)$ and $(m_1-m_2)$ are also integers. So that both $\int_{a}^{a+n\frac {2\pi}{\omega }} \cos {(m_1+m_2)\omega t} \mathrm {d} t$ and $\int_{a}^{a+n\frac {2\pi}{\omega }} \cos {(m_1-m_2)\omega t} \mathrm {d} t$ are 0.

Therefore, $\sin m_1\omega t$ and $\sin m_2\omega t$ are orthogonal in whole number periods.

BUT WAIT A MINUTE! We are discussing under the condition that $m_1\ne m_2$, in other words, if two DIFFERENT functions are orthogonal. What if $m_1 = m_2$, two functions are the same?

Solving $\frac {1}{2}\left [ \int_{a}^{a+n\frac {2\pi}{\omega }} \cos {(m_1+m_2)\omega t} \mathrm {d} t +\int_{a}^{a+n\frac {2\pi}{\omega }} \cos {(m_1-m_2)\omega t} \mathrm {d} t \right ]$ under the condition that $m_1 = m_2$, we can get:

$\frac{1}{2}\left [ \int_{a}^{a+n\frac{2\pi}{\omega }} \cos{(m_1+m_2)\omega t} \mathrm{d}t +\int_{a}^{a+n\frac{2\pi}{\omega }} \cos{(0)\omega t} \mathrm{d}t \right ]$

Of course, we already know that $\int_{a}^{a+n\frac {2\pi}{\omega }} \cos {(m_1+m_2)\omega t} \mathrm {d} t=0$. But $\cos 0 = 1$, obviously the result of an integral of 1 is not 0. Hey, $\frac {1}{2}\int_{a}^{a+n\frac {2\pi}{\omega }} \cos {(0)\omega t} \mathrm {d} t=n\frac {\pi}{\omega }$!

If we add a amplitude coefficient to the sinewave, make it $A\cos (m_1-m_2)\omega t$, it’s integral will be $An\frac {\pi}\omega$. In other words, is proportional to the amplitude and duration of the wave.

Euler’s formula

Now, we have derived an important property of trigonometric functions (sine functions, to be precise): the different trigonometric have a dot product of 0 (orthogonal) in integer periods, and the same trigonometric functions have a dot product proportional to the amplitude and duration.

For a function $f(x)$ with domain $[a,b]$, (the domain here is also the sampling range), its frequency domain is $F (x)=\frac {1}{b-a}\int_{a}^{b} f (t) \cos {(xt)} \mathrm {d} t$.

Instead of using trigonometric functions, the actual Fourier transform uses $e^{ix}$ from Euler’s formula and takes the real part or the norm of the result.

The imaginary part obtained from the Fourier transform is also useful. The imaginary part contains the imformation of the phase. The Fourier formula for the phase is not so commonly used, so I will not put it here.

Euler’s formula: $e^{ix}=\cos x + i \sin x$

The Continuous Fourier Transform formula is:

$j=-i$, both of them are representing imaginary. For some reasons physicists and mathematicians often use different letters.

Note that the real part of $e^{-j2\pi kx}$ equals $\cos (2\pi kx)$. $k$ is the frequency. In contrast, there is the inverse Fourier transform, i.e., from the frequency domain to the time domain. Here, because of the limitation of space and the similarity of these math, we will not derive the inverse transform. The formula for continuous Fourier inversions transform is: