Tutorial 25: Using the FFT

The French mathematician Joseph Fourier demonstrated that any periodic wave can be expressed as the sum of harmonically related sinusoids, each with its own amplitude and phase. Given a digital representation of a periodic wave, one can employ a formula known as the discrete Fourier transform (DFT) to calculate the frequency, phase, and amplitude of its sinusoidal components. Essentially, the DFT *transforms* a time-domain representation of a sound wave into a frequency- domain spectrum. This spectrum can also be transformed back into a time-domain waveform.

Typically the Fourier transform is used on a small ‘slice’ of time, which ideally is equal to exactly one cycle of the wave being analyzed. To perform this operation on ‘real world’ sounds -- which are almost invariably *not* strictly periodic, and which may be of unknown frequency -- one can perform the DFT on consecutive time slices to get a sense of how the spectrum changes over time.

If the number of digital samples in each time slice is a power of 2, one can use a faster version of the DFT known as the fast Fourier transform (FFT). The formula for the FFT is encapsulated in the fft~ object. The mathematics of the Fourier transform are well beyond the scope of this manual, but this tutorial chapter will demonstrate how to use the fft~ object for signal analysis and resynthesis.

fft~ receives a signal in its inlet. For each slice of time it receives (512 samples long by default) it sends out a signal of the same length listing the amount of energy in each frequency region. The signal that comes out of fft~ is not anything you're likely to want to listen to. Rather, it's a list of relative amplitudes of 512 different frequency bands in the received signal. This ‘list’ happens to be exactly the same length as the samples received in each time slice, so it comes out at the same rate as the signal comes in. The signal coming out of fft~ is a frequency-domain analysis of the samples it received in the previous time slice.

Although the transform comes out of fft~ in the form of a , it is not a time-domain signal. The only object that ‘understands’ this special is the ifft~ object, which performs an *inverse* FFT on the spectrum and transforms it back into a time-domain waveform.

With the capture~ object you can grab some of the output of fft~ and examine the frequency analysis of a signal.

• Click on one of the ezdac~ objects to turn audio on.

When audio is turned on, dspstate~ sends the MSP sampling rate out its middle outlet. We use this number to calculate a frequency that has a period of exactly 512 samples. This is the fundamental frequency of the FFT itself. If we send a wave of that frequency into fft~, each time slice would contain exactly one cycle of the waveform. We will actually use a cosine wave at ten times that frequency as the test tone for our analysis, as shown below.

The upper left corner of the Patcher window shows a very simple use of fft~. The analysis is stored in a capture~ object, and an ifft~ object transforms the analysis back into an audio signal. (Ordinarily you would not transform and inverse-transform an audio signal for no reason like this. The ifft~ is used in this patch simply to demonstrate that the analysis-resynthesis process works.)

• Click on the toggle in the upper left part of the patch to hear the resynthesized sound. Click on the toggle again to close the gate~. Now double-click on the capture~ object in that part of the patch to see the analysis performed by fft~.

In the capture~ text window, the first 512 numbers are all . That is the output of fft~ during the first time slice of its analysis. Remember, the analysis it sends out is always of the previous time slice. When audio was first turned on, there was no previous audio, so the fft~ object's analysis shows no signal.

• Scroll past the first 512 numbers. (The numbers in the capture~ object's text window are grouped in blocks, so if your signal vector size is 256 you will have two groups of numbers that are all .) Look at the second time slice of 512 numbers.

Each of the 512 numbers represents a harmonic of the FFT frequency itself, starting at the 0th harmonic (0 Hz). The analysis shows energy in the eleventh number, which represents the 10th harmonic of the FFT, 10/512 the sampling rate -- precisely our test frequency. (The analysis also shows energy at the 10th number from the end, which represents 502/512 the sampling rate. This frequency exceeds the Nyquist rate and is actually equivalent to -10/512 of the sampling rate.

Note that once we reach the Nyquist rate on the 257th number (256/512 of the sampling rate), all numbers after that are *folded* *back* down from the Nyquist rate. Another way to think of this is that these numbers represent negative frequencies that are now ascending from the (negative) Nyquist rate. Thus, the 258th number is the energy at the Nyquist rate *minus* 1/512 of the sampling rate (which could also be thought of as -255/512 the sampling rate). In our example, we see energy in the 11th frequency region (10/512 the sampling rate) and the 503rd frequency region (-256/512 - -246/512 = -10/512 the sampling rate).

It appears that fft~ has correctly analyzed the signal. There's just one problem...

The FFT assumes that the samples being analyzed comprise one cycle of a periodic wave. In our example, the cosine wave was the 10th harmonic of the FFT's fundamental frequency, so it worked fine. In most cases, though, the 512 samples of the FFT will not be precisely one cycle of the wave. When that happens, the FFT still analyzes the 512 samples as if they were one cycle of a waveform, and reports the spectrum of that wave. Such an analysis will contain many spurious frequencies not actually present in the signal.

• Close the text window of capture~. With the audio still on, set the ‘Test Frequency’ number box to . This also triggers the message in the upper left corner of the patch to empty the capture~ object of its prior contents. Double-click once again on capture~, and scroll ahead in the text window to see its new contents.

The analysis of the 1000 Hz tone does indeed show greater energy at 1000 Hz -- in the 12th and 13th frequency regions if your MSP sampling rate is 44,100 Hz -- but it also shows energy in virtually every other region. That's because the waveform it analyzed is no longer a sinusoid. (An exact number of cycles does not fit precisely into the 512 samples.) All the other energy shown in this FFT is an artifact of the ‘incorrect’ interpretation of those 512 samples as one period of the correct waveform.

To resolve this problem, we can try to ‘taper’ the ends of each time slice by applying an amplitude envelope to it, and use overlapping time slices to compensate for the use of the envelope.

The lower right portion of the tutorial patch takes this approach of using overlapping time slices, and applies a triangular amplitude envelope to each slice *before* analyzing it, and again *after* resynthesizing it. (Other shapes of amplitude envelope are often used for this process, but the triangular window is simple and fairly effective.) In this way, the fft~ object is viewing each time slice through a triangular window which tapers its ends down, thus filtering out many of the false frequencies that would be introduced by discontinuities. This technique is known as *windowing*.

To accomplish this windowing and overlapping of time slices, we must perform two FFTs, one of which is offset 256 samples later than the other. (Note that this part of the patch will only work if your current MSP Signal Vector size is 256 or less, since fft~ can only be offset by a multiple of the vector size.) The offset of an FFT can be given as a (third) typed-in argument to fft~, as is done for the fft~ object on the right. This results in overlapping time slices.

The windowing is achieved by multiplying the signal by a triangular waveform (stored in the buffer~ object) which recurs at the same frequency as the FFT -- once every 512 samples. The window is offset by 1/2 cycle (256 samples) for the second fft~. Notice also that because we will be applying the amplitude envelope twice (once before the fft~ and once again after the ifft~), we take the square root of the envelope values, so we do not have unwanted amplitude modulation resulting from our envelopes (we want the overlapping envelopes to crossfade evenly and always add up to 1).

• Double-click on the buffer~ object to view its contents. Then close the buffer~ window and double-click on the capture~ object that contains the FFT of the windowed signal. Scroll past the first block or two of numbers until you see the FFT analysis of the windowed 1000 Hz tone.

As with the unwindowed FFT, the energy is greatest around 1000 Hz, but here the (spurious) energy in all the other frequency regions is greatly reduced by comparison with the unwindowed version.

In this patch we have used the fft~ object to view and analyze a signal, and to demonstrate the effectiveness of windowing the signal and using overlapping FFTs. However, one could also write a patch that alters the values in the signal coming out of fft~, then sends the altered analysis to ifft~ for resynthesis. This kind of processing using overlapped, windowed time slices is known as a Short Term Fourier Transform (STFT), and is the basis for frequency-domain audio processing. We will be discussing a simpler way to use the STFT in Tutorial 26.

Windowing, in addition to being important for reducing the false frequencies from discontinuities in the input waveform, as we have already seen, is also important to smooth out any discontinuities which occur in the resynthesized time-domain waveform coming from the ifft~. This is why we must window the time-domain signal both on input and output. In this tutorial we would only get such output discontinuities if we modified the signal between the fft~ and ifft~ objects.

Summary

The fast Fourier transform (FFT) is an algorithm for transforming a time-domain digital signal into a frequency-domain representation of the relative amplitude of different frequency regions in the signal. An FFT is computed using a relatively small excerpt of a signal, usually a slice of time 512 or 1024 samples long. To analyze a longer signal, one performs multiple FFTs using consecutive (or overlapping) time slices.

The fft~ object performs an FFT on the signal it receives, and sends out (also in the form of a ) a frequency-domain analysis of the received signal. The only object that understands the output of fft~ is ifft~ which performs an inverse FFT to synthesize a time-domain signal based on the frequency-domain information. One could alter the signal as it goes from fft~ to ifft~, in order to change the spectrum.

The FFT only works perfectly when analyzing exactly one cycle (or exactly an integer number of cycles) of a tone. To reduce the artifacts produced when this is not the case, one can window the signal being analyzed by applying an amplitude envelope to taper the ends of each time slice. The amplitude envelope can be applied by multiplying the signal by using a cycle~ object to read a windowing function from a buffer~ repeatedly at the same rate as the FFT itself (i.e., once per time slice). To eliminate any artifacts that result from modifying the frequency-domain signal, we must also apply the same envelope at the output of the ifft~.