MSP Analysis Tutorial 3: Using the FFT
The French mathematician Joseph Fourier demonstrated that any periodic wave can be expressed as the sum of harmonically related sinusoids, each with its own amplitude and phase. Given a digital representation of a periodic wave, one can employ a formula known as the discrete Fourier transform (DFT) to calculate the frequency, phase, and amplitude of its sinusoidal components. Essentially, the DFT transforms a time-domain representation of a sound wave into a frequency- domain spectrum. This spectrum can also be transformed back into a time-domain waveform.
Typically the Fourier transform is used on a small ‘slice’ of time, which ideally is equal to exactly one cycle of the wave being analyzed. To perform this operation on ‘real world’ sounds -- which are almost invariably not strictly periodic, and which may be of unknown frequency -- one can perform the DFT on consecutive time slices to get a sense of how the spectrum changes over time.
If the number of digital samples in each time slice (or frame) is a power of 2, one can use a faster version of the DFT known as the fast Fourier transform (FFT). The formula for the FFT is encapsulated in the fft~ object. The mathematics of the Fourier transform are well beyond the scope of this manual, but this tutorial chapter will demonstrate how to use the fft~ object for signal analysis and resynthesis.
Spectrum of a signal: fft~
fft~ receives a signal in its inlet. For each slice of time it receives (512 samples long by default) it sends out a signal of the same length listing the amount of energy in each frequency region. The signal that comes out of fft~ is not anything you're likely to want to listen to. Rather, it's a list of relative amplitudes of 512 different frequency bands in the received signal. This ‘list’ happens to be exactly the same length as the samples received in each time slice, so it comes out at the same rate as the signal comes in. The signal coming out of fft~ is a frequency-domain analysis of the samples it received in the previous time slice.
Although the transform comes out of fft~ in the form of a , it is not a time-domain signal. The only object that ‘understands’ this special is the ifft~ object, which performs an inverse FFT on the spectrum and transforms it back into a time-domain waveform.
When audio is turned on, dspstate~ sends the MSP sampling rate out its middle outlet. We use this number to calculate a frequency that has a period of exactly 512 samples. This is the fundamental frequency of the FFT itself. If we send a wave of that frequency into fft~, each time slice would contain exactly one cycle of the waveform. We will actually use a cosine wave at ten times that frequency as the test tone for our analysis, as shown below.
The upper left corner of the Patcher window shows a very simple use of fft~. The analysis is stored in a capture~ object, and an ifft~ object transforms the analysis back into an audio signal. (Ordinarily you would not transform and inverse-transform an audio signal for no reason like this. The ifft~ is used in this patch simply to demonstrate that the analysis-resynthesis process works.)
The plot~ shows a single spike at 861.3 Hz. If you click the message and double cick the capture~ object, you will see some of the numbers fft~ is putting out. Each of the 512 numbers represents a harmonic of the FFT frequency itself, starting at the 0th harmonic (0 Hz). The analysis shows energy in the eleventh number, which represents the 10th harmonic of the FFT, 10/512 the sampling rate -- precisely our test frequency. The analysis also shows energy at the 10th number from the end, (scroll to see) which represents 502/512 the sampling rate. This frequency exceeds the Nyquist rate and is actually equivalent to -10/512 of the sampling rate. (This component is not visible in the plot~ display which is limited to half the sampling rate).
It appears that fft~ has correctly analyzed the signal. There's just one problem...
Practical problems of the FFT
The FFT assumes that the samples being analyzed comprise one cycle of a periodic wave. In our example, the cosine wave was the 10th harmonic of the FFT's fundamental frequency, so it worked fine. In most cases, though, the 512 samples of the FFT will not be precisely one cycle of the wave. When that happens, the FFT still analyzes the 512 samples as if they were one cycle of a waveform, and reports the spectrum of that wave. Such an analysis will contain many spurious frequencies not actually present in the signal.
The analysis of the 1000 Hz tone does indeed show greater energy at 1000 Hz -- in the 12th and 13th frequency regions if your MSP sampling rate is 44,100 Hz -- but it also shows energy in virtually every other region. That's because the waveform it analyzed is no longer a sinusoid. (An exact number of cycles does not fit precisely into the 512 samples.) All the other energy shown in this FFT is an artifact of the ‘incorrect’ interpretation of those 512 samples as one period of the correct waveform. Also if you close the capture~ object, hit clear and reopen it, you will see that the numbers have all changed, as the phase relationship between 1000Hz and 947.5 (the 11th harmonic of the fft frequency) will be constantly changing.
All of this will be visible in the plot, which has a broad peak with constantly changing skirts.
To resolve this problem, we can try to ‘taper’ the ends of each time slice by applying an amplitude envelope to it, and use overlapping time slices to compensate for the use of the envelope.
The right portion of the tutorial patch takes this approach of using overlapping time slices, and applies a triangular amplitude envelope to each slice before analyzing it, and again after resynthesizing it. (Other shapes of amplitude envelope are often used for this process, but the triangular window is simple and fairly effective.) In this way, the fft~ object is viewing each time slice through a triangular window which tapers its ends down, thus filtering out many of the false frequencies that would be introduced by discontinuities. This technique is known as windowing.
To accomplish this windowing and overlapping of time slices, we must perform two FFTs, one of which is offset 256 samples later than the other. (Note that this part of the patch will only work if your current MSP Signal Vector size is 256 or less, since fft~ can only be offset by a multiple of the vector size.) The offset of an FFT can be given as a (third) typed-in argument to fft~, as is done for the fft~ object on the right. This results in overlapping time slices.
The windowing is achieved by multiplying the signal by a triangular waveform (stored in the buffer~ object—double-click to view its contents) which recurs at the same frequency as the FFT -- once every 512 samples. The window is offset by 1/2 cycle (256 samples) for the second fft~. Notice also that because we will be applying the amplitude envelope twice (once before the fft~ and once again after the ifft~), we take the square root of the envelope values, so we do not have unwanted amplitude modulation resulting from our envelopes (we want the overlapping envelopes to crossfade evenly and always add up to 1).
As with the unwindowed FFT, the energy is greatest around 1000 Hz, but here the (spurious) energy in all the other frequency regions is greatly reduced by comparison with the unwindowed version.
Displaying spectra with plot~
The plot~ object is designed to show graphs of listed data or audio signals. With the latter, it works a lot like scope~, where the portion of waveform shown is determined by the number of points in the display. In plot~ this is set with a message. plot~ has a lot of options, especially if you want to add gridlines and titles to the display. Luckily, there are some prototypes available for plot~, accessed via the button at the left edge of the display. Prototypes include several spectral displays complete with gridlines and labels.
There is some assembly required before plot can show the fft spectrum. Open the Display_analysis subpatch to see what is required.
- The signals from fft~ can reach very high values (as high as the frame size) and must be scaled down to a maximum of 1.0 for plot~. Hence the divide by 512.
- The power in each bin is represented by a complex number. The magnitude of a complex number is the square root of the sum of the squares of the real and imaginary parts, so each signal from the fft~ is squared before they are added together. If we leave the square root out, we wind up with an exponential scale, appropriate to show as dB.
- Plot~ has no triggering mechanism as scope~ and spectroscope~ do, so the signal from fft~ needs to be stabilized before it can be displayed. The vectral~ object is ideal for this, as it aligns incoming sames with indices in the left inlets. The right outlet of fft~ produces a ramp of bin numbers from 0 to the window size, perfect of vectral~. Vectoral~ can also average the values over several frames, which makes the display smoother.
Signal processing using the FFT
In this patch we have used the fft~ object to view and analyze a signal, and to demonstrate the effectiveness of windowing the signal and using overlapping FFTs. However, one could also write a patch that alters the values in the signal coming out of fft~, then sends the altered analysis to ifft~ for resynthesis. This kind of processing using overlapped, windowed time slices is known as a Short Term Fourier Transform (STFT), and is the basis for frequency-domain audio processing. We will be discussing a simpler way to use the STFT in Tutorial 26.
Windowing, in addition to being important for reducing the false frequencies from discontinuities in the input waveform, as we have already seen, is also important to smooth out any discontinuities which occur in the resynthesized time-domain waveform coming from the ifft~. This is why we must window the time-domain signal both on input and output. In this tutorial we would only get such output discontinuities if we modified the signal between the fft~ and ifft~ objects.
The fast Fourier transform (FFT) is an algorithm for transforming a time-domain digital signal into a frequency-domain representation of the relative amplitude of different frequency regions in the signal. An FFT is computed using a relatively small excerpt of a signal, usually a slice of time 512 or 1024 samples long. To analyze a longer signal, one performs multiple FFTs using consecutive (or overlapping) time slices.
The fft~ object performs an FFT on the signal it receives, and sends out (also in the form of a ) a frequency-domain analysis of the received signal. The only object that understands the output of fft~ is ifft~ which performs an inverse FFT to synthesize a time-domain signal based on the frequency-domain information. One could alter the signal as it goes from fft~ to ifft~, in order to change the spectrum.
The FFT only works perfectly when analyzing exactly one cycle (or exactly an integer number of cycles) of a tone. To reduce the artifacts produced when this is not the case, one can window the signal being analyzed by applying an amplitude envelope to taper the ends of each time slice. The amplitude envelope can be applied by multiplying the signal by using a cycle~ object to read a windowing function from a buffer~ repeatedly at the same rate as the FFT itself (i.e., once per time slice). To eliminate any artifacts that result from modifying the frequency-domain signal, we must also apply the same envelope at the output of the ifft~.
|buffer~||Store audio samples|
|capture~||Store a signal to view as text|
|fft~||Fast Fourier transform|
|ifft~||Inverse fast Fourier transform|