The French mathematician Joseph Fourier demonstrated that any periodic wave can be expressed as the sum of harmonically related sinusoids, each with its own amplitude and phase. Given a digital representation of a periodic wave, one can employ a formula known as the discrete Fourier transform (DFT) to calculate the frequency, phase, and amplitude of its sinusoidal components. Essentially, the DFT transforms a time-domain representation of a sound wave into a frequency- domain spectrum. This spectrum can also be transformed back into a time-domain waveform.
Typically the Fourier transform is used on a small ‘slice’ of time, which ideally is equal to exactly one cycle of the wave being analyzed. To perform this operation on ‘real world’ sounds -- which are almost invariably not strictly periodic, and which may be of unknown frequency -- one can perform the DFT on consecutive time slices to get a sense of how the spectrum changes over time.
If the number of digital samples in each time slice is a power of 2, one can use a faster version of the DFT known as the fast Fourier transform (FFT). The formula for the FFT is encapsulated in the
fft~ object. The mathematics of the Fourier transform are well beyond the scope of this manual, but this tutorial chapter will demonstrate how to use the
fft~ object for signal analysis and resynthesis.
fft~ receives a signal in its inlet. For each slice of time it receives (512 samples long by default) it sends out a signal of the same length listing the amount of energy in each frequency region. The signal that comes out of
fft~ is not anything you're likely to want to listen to. Rather, it's a list of relative amplitudes of 512 different frequency bands in the received signal. This ‘list’ happens to be exactly the same length as the samples received in each time slice, so it comes out at the same rate as the signal comes in. The signal coming out of
fft~ is a frequency-domain analysis of the samples it received in the previous time slice.
Although the transform comes out of
fft~ in the form of a
signal, it is not a time-domain signal. The only object that ‘understands’ this special
signal is the
ifft~ object, which performs an
inverse FFT on the spectrum and transforms it back into a time-domain waveform.
The signal coming out of fft~ is spectral information, not a time-domain signal
With the
capture~ object you can grab some of the output of
fft~ and examine the frequency analysis of a signal.
• Click on one of the
ezdac~ objects to turn audio on.
When audio is turned on,
dspstate~ sends the MSP sampling rate out its middle outlet. We use this number to calculate a frequency that has a period of exactly 512 samples. This is the fundamental frequency of the FFT itself. If we send a wave of that frequency into
fft~, each time slice would contain exactly one cycle of the waveform. We will actually use a cosine wave at ten times that frequency as the test tone for our analysis, as shown below.
The test tone is at 10 times the base frequency of the FFT time slice
The upper left corner of the Patcher window shows a very simple use of
fft~. The analysis is stored in a
capture~ object, and an
ifft~ object transforms the analysis back into an audio signal. (Ordinarily you would not transform and inverse-transform an audio signal for no reason like this. The
ifft~ is used in this patch simply to demonstrate that the analysis-resynthesis process works.)
• Click on the
toggle in the upper left part of the patch to hear the resynthesized sound. Click on the
toggle again to close the
gate~. Now double-click on the
capture~ object in that part of the patch to see the analysis performed by
fft~.
In the
capture~ text window, the first 512 numbers are all
0.0000. That is the output of
fft~ during the first time slice of its analysis. Remember, the analysis it sends out is always of the previous time slice. When audio was first turned on, there was no previous audio, so the
fft~ object's analysis shows no signal.
• Scroll past the first 512 numbers. (The numbers in the
capture~ object's text window are grouped in blocks, so if your signal vector size is 256 you will have two groups of numbers that are all
0.0000.) Look at the second time slice of 512 numbers.
Each of the 512 numbers represents a harmonic of the FFT frequency itself, starting at the 0th harmonic (0 Hz). The analysis shows energy in the eleventh number, which represents the 10th harmonic of the FFT,
10/
512 the sampling rate -- precisely our test frequency. (The analysis also shows energy at the 10th number from the end, which represents
502/
512 the sampling rate. This frequency exceeds the Nyquist rate and is actually equivalent to -
10/
512 of the sampling rate.
Technical detail: An FFT divides the entire available frequency range into as many bands (regions) as there are samples in each time slice. Therefore, each set of 512 numbers coming out of fft~ represents 512 divisions of the frequency range from 0 to the sampling rate. The first number represents the energy at 0 Hz, the second number represents the energy at 1/512 the sampling rate, the third number represents the energy at 2/
512 the sampling rate, and so on.
Note that once we reach the Nyquist rate on the 257th number (
256/
512 of the sampling rate), all numbers after that are
folded back down from the Nyquist rate. Another way to think of this is that these numbers represent negative frequencies that are now ascending from the (negative) Nyquist rate. Thus, the 258th number is the energy at the Nyquist rate
minus 1/
512 of the sampling rate (which could also be thought of as -
255/
512 the sampling rate). In our example, we see energy in the 11th frequency region (
10/
512 the sampling rate) and the 503rd frequency region (-
256/
512 - -
246/
512 = -
10/
512 the sampling rate).
It appears that
fft~ has correctly analyzed the signal. There's just one problem...
The FFT assumes that the samples being analyzed comprise one cycle of a periodic wave. In our example, the cosine wave was the 10th harmonic of the FFT's fundamental frequency, so it worked fine. In most cases, though, the 512 samples of the FFT will not be precisely one cycle of the wave. When that happens, the FFT still analyzes the 512 samples as if they were one cycle of a waveform, and reports the spectrum of that wave. Such an analysis will contain many spurious frequencies not actually present in the signal.
• Close the text window of
capture~. With the audio still on, set the ‘Test Frequency’
number box to
1000. This also triggers the
clear message in the upper left corner of the patch to empty the
capture~ object of its prior contents. Double-click once again on
capture~, and scroll ahead in the text window to see its new contents.
The analysis of the 1000 Hz tone does indeed show greater energy at 1000 Hz -- in the 12th and 13th frequency regions if your MSP sampling rate is 44,100 Hz -- but it also shows energy in virtually every other region. That's because the waveform it analyzed is no longer a sinusoid. (An exact number of cycles does not fit precisely into the 512 samples.) All the other energy shown in this FFT is an artifact of the ‘incorrect’ interpretation of those 512 samples as one period of the correct waveform.
To resolve this problem, we can try to ‘taper’ the ends of each time slice by applying an amplitude envelope to it, and use overlapping time slices to compensate for the use of the envelope.
The lower right portion of the tutorial patch takes this approach of using overlapping time slices, and applies a triangular amplitude envelope to each slice
before analyzing it, and again
after resynthesizing it. (Other shapes of amplitude envelope are often used for this process, but the triangular window is simple and fairly effective.) In this way, the
fft~ object is viewing each time slice through a triangular window which tapers its ends down, thus filtering out many of the false frequencies that would be introduced by discontinuities. This technique is known as
windowing.
Overlapping triangular windows (envelopes) applied to a 100 Hz cosine wave
To accomplish this windowing and overlapping of time slices, we must perform two FFTs, one of which is offset 256 samples later than the other. (Note that this part of the patch will only work if your current MSP Signal Vector size is 256 or less, since
fft~ can only be offset by a multiple of the vector size.) The offset of an FFT can be given as a (third) typed-in argument to
fft~, as is done for the
fft~ object on the right. This results in overlapping time slices.
One FFT is taken 256 samples later than the other
The windowing is achieved by multiplying the signal by a triangular waveform (stored in the
buffer~ object) which recurs at the same frequency as the FFT -- once every 512 samples. The window is offset by
1/
2 cycle (256 samples) for the second
fft~. Notice also that because we will be applying the amplitude envelope twice (once before the
fft~ and once again after the
ifft~), we take the square root of the envelope values, so we do not have unwanted amplitude modulation resulting from our envelopes (we want the overlapping envelopes to crossfade evenly and always add up to 1).
• Double-click on the
buffer~ object to view its contents. Then close the
buffer~ window and double-click on the
capture~ object that contains the FFT of the windowed signal. Scroll past the first block or two of numbers until you see the FFT analysis of the windowed 1000 Hz tone.
As with the unwindowed FFT, the energy is greatest around 1000 Hz, but here the (spurious) energy in all the other frequency regions is greatly reduced by comparison with the unwindowed version.
In this patch we have used the
fft~ object to view and analyze a signal, and to demonstrate the effectiveness of windowing the signal and using overlapping FFTs. However, one could also write a patch that alters the values in the signal coming out of
fft~, then sends the altered analysis to
ifft~ for resynthesis. This kind of processing using overlapped, windowed time slices is known as a Short Term Fourier Transform (STFT), and is the basis for frequency-domain audio processing. We will be discussing a simpler way to use the STFT in Tutorial 26.
Windowing, in addition to being important for reducing the false frequencies from discontinuities in the input waveform, as we have already seen, is also important to smooth out any discontinuities which occur in the resynthesized time-domain waveform coming from the
ifft~. This is why we must window the time-domain signal both on input and output. In this tutorial we would only get such output discontinuities if we modified the signal between the
fft~ and
ifft~ objects.
The fast Fourier transform (FFT) is an algorithm for transforming a time-domain digital signal into a frequency-domain representation of the relative amplitude of different frequency regions in the signal. An FFT is computed using a relatively small excerpt of a signal, usually a slice of time 512 or 1024 samples long. To analyze a longer signal, one performs multiple FFTs using consecutive (or overlapping) time slices.
The
fft~ object performs an FFT on the signal it receives, and sends out (also in the form of a
signal) a frequency-domain analysis of the received signal. The only object that understands the output of
fft~ is
ifft~ which performs an inverse FFT to synthesize a time-domain signal based on the frequency-domain information. One could alter the signal as it goes from
fft~ to
ifft~, in order to change the spectrum.
The FFT only works perfectly when analyzing exactly one cycle (or exactly an integer number of cycles) of a tone. To reduce the artifacts produced when this is not the case, one can window the signal being analyzed by applying an amplitude envelope to taper the ends of each time slice. The amplitude envelope can be applied by multiplying the signal by using a
cycle~ object to read a windowing function from a
buffer~ repeatedly at the same rate as the FFT itself (i.e., once per time slice). To eliminate any artifacts that result from modifying the frequency-domain signal, we must also apply the same envelope at the output of the
ifft~.
See Also
Name |
Description |
buffer~ |
Store audio samples
|
capture~ |
Store a signal to view as text
|
fft~ |
Fast Fourier transform
|
ifft~ |
Inverse fast Fourier transform
|