- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 312
FFT
The Fast Fourier Transform (FFT) is a mathematical operation that changes the domain (x-axis) of a signal from time to frequency. This is particularly useful for determining the frequency of a signal or for decomposing a signal consisting of multiple frequencies. In other words: you feed an audio signal and get the frequencies and the corresponding strength as a result. That's very handy e.g. if you want to determine the related musical notes that are played.
 
The length defines the number of samples that are used to run the FFT. The length must be a value of the power of 2 (e.g. 1024). This parameter impacts
- the calculation speed and frequency of calculation
- the memory which is needed
- the frequency resolution of the result (number of bins)
The length is determining the number of frequency bins with the following formula: frequency bins = length / 2. A big length is consuming a lot of memory and uses more processing time, so make sure that you keep the length as small as possible: 512 might be a good starting value!
The stride defines the number of samples that we use to advance at each step. If we do not define any stride we just consume the samples sequentially:
If we define a stride, we move ahead in steps as follows
 
As you can see in the image above the last length-stride samples are reprocessed with each step.
In our framework we can use the AudioRealFFT class as copy destination. You start the processing by calling the begin method which is expecting configuration information where we define the length and stride. We also need to provide the regular audio parameters (channels, sample_rate, bits_per_sample). Finally we can define a callback method which will be called when we get a new result.
In order to determine the result
- we can use the result() method which provides the best AudioFFTResult or
- if we want to get the N best values we can call the resultArray() method to which we pass an array of AudioFFTResult.
- The magnitudes() method returns the array of all magnitudes and has the length that can be determinded by the size() method. You can determine the corresponding frequencies by calling frequency(int idx)
- If you are limited on memory you can loop from 0 to size() and call the magnitude(int idx) and frequency(int idx) methods instead.
The AudioFFTResult contains
- the frequency
- the magnitude (of the frequency)
- the musical note which is corresponding to the frequency (frequencyAsNote())
An example can be found on Github
FFT windows reduce the effects of leakage but can not eliminate leakage entirely. In effect, they only change the shape of the leakage. In addition, each type of window affects the spectrum in a slightly different way. For further details, I recommend to consult Wikipedia. We support the the following Window Functions, but you can easily add your own subclasses. If you have enough memory, I recommend that you use the buffered implementation:
RealFFT fft;
auto cfg = fft.defaultConfig();
// buffered
cfg.window_function = new BufferedWindow(new Hamming());
// not buffered
cfg.window_function = new Hamming();FFT can only be executed on one channel. If you provide autio with multiple channels you need to indicate which channel should be used to analyse the data. By default we use the first channel (= 0)
cfg.channel_used = 0;Alternatively you can use the KissFFT, ESP32-FFT which you will need to install separately: Some additional implementations are based on ARM CMSIS DSP and Espressif DSP Library which are part of the corresponding Arduino implementations.
| Ext Library | Include | Class Name | Comment | 
|---|---|---|---|
| n/a | AudioLibs/AudioRealFFT.h | AudioRealFFT | included in AudioTools | 
| KissFFT Library | AudioLibs/AudioKissFFT.h | AudioKissFFT | |
| ESP32-FFT | AudioLibs/AudioESP32FFT.h | AudioESP32FFT | |
| n/a | AudioLibs/AudioEspressifFFT.h | AudioEspressifFFT | included in Arduino (esp-dsp) | 
| n/a | AudioLibs/AudioCmsisFFT.h | AudioCmsisFFT | Included in Arduino RP2040 | 
Or you can easily integrate your own implementation. Just have a look how the above includes have been implemented.
I created some test cases to measure the speed of FFT. The table below gives the speed in ms of fft with 4096 samples:
| ESP32 | ESP32-S3 | STM32F411 | RP2040 | STM32H743 | |
| AudioRealFFT | 3.3 | 2.6 | 12.1 | 71.0 | 1.1 | 
| AudioKissFFT | 5.9 | n/a | 26.9 | 20.5 | 2.8 | 
| AudioESP32FFT | 1.1 | 1.25 | 5.9 | 68.2 | 1.0 | 
| AudioCmsisFFT | n/a | n/a | 8.2 | 86.1 | 1.0 | 
| AudioEspressifFFT | 3.5 | 3.2 | n/a | n/a | n/a | 
Please note, that the performance of the different libraries dependes on the sample size. AudioRealFFT seems to perform better, with bigger values!
The inverse FFT is generating audio samples in the time domain from the spektrum information. This functionality is executed automatically when we read the data from the fft source: The spektrum information can be set in the callback.
This implementation also supports a stride and a window function: the samples are combined using the overlap add method.
Here is an example that generates a tone across the full spektrum:
#include "AudioTools.h"
#include "AudioTools/AudioLibs/AudioRealFFT.h" // using RealFFT
AudioInfo info(44100, 2, 16);
AudioRealFFT afft; // or AudioKissFFT
//CsvOutput<int16_t> out(Serial);
I2SStream out;
StreamCopy copier(out, afft);
int bin_idx = 0;
// privide fft data
void fftFillData(AudioFFTBase &fft) {
  fft.clearBins();
  FFTBin bin{1.0f,1.0f};
  fft.setBin(bin_idx, bin);
  // restart from first bin
  if (++bin_idx>=fft.size()) bin_idx = 0;
}
void setup() {
  AudioToolsLogger.begin(Serial, AudioToolsLogLevel::Warning);
  // Setup FFT
  auto tcfg = afft.defaultConfig(RX_MODE);
  tcfg.copyFrom(info);
  tcfg.length = 1024;
  tcfg.callback = fftFillData;
  afft.begin(tcfg);
  // setup output
  auto ocfg = out.defaultConfig(TX_MODE);
  ocfg.copyFrom(info);
  out.begin(ocfg);
}
void loop() { copier.copy(); }We just copy the data from the fft source to the i2s sink: the fftFillData() callback is just setting the bin that represents the frequency. At each call the bin (and frequency) is increased.
