Convolution Filtering | Pingkang Chen

top of page

PingKang Chen's Blog

Convolution_Filtering

View the Code on GitHub: From Github!

This project aim to reproduce the audio experience of target playback devices, allowing the listener to perceive the sound as if it is originating from these different sources (phone, computer, car audio system, etc.)

This project includes two approaches: Convolution_Filter and Convolution_FilterCurve.

1: Convolution_Filter

Workflow:

1:Making use of the attribute of white noise as a random signal with equal intensity at all frequencies, I recorded a burst of it played by the target playback device to serve as the IR in convolution, let's say my phone, using the Shure SM81 microphone known for its flat frequency response.

2: Obtain a snippet of the IR (noise) with a window length of N samples, and get the FFT value of the hann-windowed IR.

3: The long input signal is divided into overlapping frames with the frame length of N and overlap length of 1024, and the Hann window is applied to each frame.

4: Sum the FFT values of all the input signal frames to convolve with the FFT value of the IR.

5: Obtain the output filtered signal by performing the Inverse FFT (IFFT) on the convolved FF values, and then window this new signal and reconstruct it with overlap-add method.

2: Convolution _FilterCurve

The second approach involves obtaining a ratio array between the filtered frequency bins of the IR and those of the original impulse (white noise), which serves as the filter curve magnitude scaler applied to each frame of the input signal for filtering.

Core Formula:

Output_signal frames = IFFT(FFT(input_signal frames)×filterCurve)

Because matching the magnitude of the new filtered output signal to that of the input signal in the end is needed, so in this case, normalizing the IR magnitudes is also feasible, because the set of ratio array of the IR_Magnitude divided by WhiteNoise Magnitude is proportional to that of the normalized IR Magnitudes.

Formula (Normalized by Peak Value of the Magnitude of the IR):

Screenshot 2025-02-06 at 19.24.39.png

Or (optional):

Screenshot 2025-02-06 at 19.26.02.png

***Please Note: The IR above is the recorded white noise played by the speakers (ex. my phone's speaker)

Magnitude Spectrum of IR

Magnitude

(Nyquist bin)

Frequency Bins (DC to Nyquist + Mirrored)

Magnitude Spectrum of original impulse (White Noise)

÷

Magnitude

(Nyquist bin)

Frequency Bins (DC to Nyquist + Mirrored)

Workflow:

1: ~ (as above)

2:~

3:~

4: Sum the FFT values of all the input signal frames to convolve with the filter curve which is an array of proportion through 0 to 1 after normalizing the FFT value of IR.

5:~

3: Filtering effect demonstration

OriginalWhite Noise:

IR recorded from my phone's speaker:

Original Input Signal:

Filtered output signal (convolution_Filter):

Filtered output signal (convolution_FilterCurve):

00:00 / 00:01

00:00 / 00:01

00:00 / 00:09

00:00 / 00:09

00:00 / 00:09

4: Overview / Conclusion

The first method Convolution_Filter is based on the convolution between the FFTs of the impulse response with the input signal. Notably, the choice of the IR (window) length in sample will have an effect on the final rendered output. If the impulse response has a large number of samples, it means that there is a large time span that is being used to convolve with the input signal. In simple terms, the convolution operation in the time domain is equivalent to applying a delay and weighted sum of the input signal samples. If the IR has a large number of non-zero samples, it will create a delayed version of the input signal which can sometimes make it sound like an echo.

The second method Convolution_Filtercurve is based on the convolution between the normalized magnitude spectrum of the impulse response (or the absolute FFT result) with the FFT value of the input signal. Different from the first method, the phase information of the impulse response is discarded in this case, so the output signal doesn’t have any delay or echoing effect no matter what window length is chosen.

In conclusion, even though the second method sounds cleaner or without the delay or echo effect compared to the first method, the first method is more accurately reproducing the filtering effect of the target playback devices. It is because the phase of a signal is what gives the signal its time-dependent characteristics such as delay, which is crucial for accurate reproduction of the audio characteristics of the device. Although, it is true that the human ear is not sensitive to phase.

For future improvement, I will implement how to reconstruct the phase information in the second method by using the Hilbert transform.

5: Download the Code From Github

click here: From Github!

bottom of page