After about an 8 hour coding binge (mostly learning these new API’s), I had come to the product shown above. However, I did notice a few issues. Namely, I was using the Web Audio API’s AnalyserNode to perform my Fast Fourier Transform (in short, the thing that takes my signal and breaks it into frequencies). This was super convenient since I neither (a) had to implement my own or (b) find a performant implementation. However, the resolution produced by the FFT seems to be too low to accurately estimate my low frequencies. Let me explain why this is.
Quick FFT Run-Down
When you apply a frequency-domain FFT to a discretized signal, it will produce a digital approximation of values as an array. Typically, each index in this array represents a frequency and each value represents the magnitude. Then, if you had a continuous frequency-domain FFT, you could see how much of your signal was made up of frequency
123.34 Hz by using the value as the index:
const magnitude = fdFFT[123.34]
This looks intuitive; however, we do not have a continuous FFT. Instead we have a discrete FFT. In a discrete FFT, we have evenly spaced bins representing a particular frequency which are a function of two things: the sample rate and the size of our FFT. As a result, if you wanted to see frequency 123.34, you would have to do something like this:
const stepSize = sampleRate / FFT_SIZE const idx = Math.round(123.34 / stepSize) # Some approximation is required here. It could be distance, etc. const magnitude = fdFFT[idx];
The sample rate is determined by the audio input device and cannot be adjusted by us. The FFT size, however, is within our control. That being said, the Web Audio API limits us to a size no larger than 16384. Practically speaking, what does all of this mean? Well, we’ll consider the system I’ve been developing on:
sample rate = 48000Hz FFT size = 16384 (sample rate) / (FFT size) = 2.92Hz
If you look at the calculation above, you’ll notice that the step between each bucket is 2.92Hz with this configuration. In other words, if you had a 1Hz signal, it would fall between the 0Hz and 2.92Hz buckets and likely be assigned to the closest one (i.e. 0Hz). As a result, you have now lost the ability to perform frequency analysis of signals at 1Hz.
If you use this non floating-point pitch-to-frequency reference, you will notice that higher frequencies and notes are more widely spaced. Consider the difference between C#7 and C7; this is a difference of 125 Hz. Now, comparing C#2 against C2 is merely 4 Hz! Herein lies the problem. As we get lower, the difference between notes becomes miniscule. As a result, our FFT resolution must be high enough to detect the differences.
This notion of “high enough” seems a bit wishy washy, but for our purposes here, I think a minimum resolution of less than 1Hz is required. The distance between pitch is typically measured in cents. In summary, the definition is:
where is your measured frequency and is the expected frequency. If you were to plot this function, you would see that a 1Hz difference can make a huge difference for low frequencies, but be pretty miniscule for high frequencies. As an example, let’s take our example of C7 and C2 again with a 1Hz difference.
Relatively speaking, a 1Hz change in a C7 would likely be audibly indistinguishable, however, the same change on a C2 would make the note sound very off pitch. As you can see, to accurately measure the differences in low frequencies, our FFT has to be capable of providing us a fine resolution of frequency data.