Investigation of the Efficiency of the Phase-Based Method Based on Gabor Filter Banks for Audio Recovery from Video Recordings

  • Maxim Beskonchin Saint Petersburg Electrotechnical University, 5, building 3, st. Professora Popova, 197022, Saint Petersburg, Russia
  • Alexander Spiridonov Saint Petersburg Electrotechnical University, 5, building 3, st. Professora Popova, 197022, Saint Petersburg, Russia
Keywords: visual microphone, Gabor filters, phase-based analysis, audio recovery from video, sub-pixel motion, multi-scale analysis

Abstract

This paper addresses the problem of passive acoustic signal recovery through the analysis of video sequences of vibrating objects. Sound waves propagating through a medium exert varying pressure on physical objects, inducing microscopic vibrations of their surfaces. The amplitude of such vibrations typically amounts to fractions of a pixel (10−3 -- 10−2 pixels), posing a challenge for detection via classical computer vision methods based on optical flow.
This study extends the "visual microphone" approach proposed in [1]. Based on computer vision principles [3], the Fourier transform shift property serves as the theoretical foundation, mathematically linking spatial object displacement to a linear phase shift. With the general mathematical model being preserved, the frequency- spatial decomposition stage is modified. Specifically, this research evaluates the efficiency of a custom implementation of a complex Gabor filter bank (in contrast to the Steerable Pyramid used in the original work).
An optimized C++ software implementation of the algorithm is presented. The proposed solution employs convolution via the Fast Fourier Transform (FFT) and batch parallel frame processing (OpenMP), yielding signicantly higher computational efficiency compared to naive spatial convolution.
A comparative analysis regarding robustness to sensor noise [18] is conducted, demonstrating the superiority of the phase-based approach over intensity-based methods using experimental data. Results obtained with a high-speed camera (2200 fps) confirm the feasibility of recovering acoustic information within the bandwidth limited by the Nyquist frequency [15, 16] and capturing tonal signals from lightweight objects at a distance of several meters. Furthermore, the impact of incoherent sensor noise is effectively mitigated through spatial weighting and filtering.

Author Biographies

Maxim Beskonchin, Saint Petersburg Electrotechnical University, 5, building 3, st. Professora Popova, 197022, Saint Petersburg, Russia

Student, mabeskonchin@gmail.com

Alexander Spiridonov, Saint Petersburg Electrotechnical University, 5, building 3, st. Professora Popova, 197022, Saint Petersburg, Russia

Student, spiridon-2005@mail.ru

References

A. Davis, M. Rubinstein, N. Wadhwa, G. J. Mysore, F. Durand, and W. T. Freeman,“Thevisualmicrophone:passiverecoveryofsoundfromvideo,” ACM Trans. Graph. (Proc. SIGGRAPH), vol. 33, Art. no. 79, no. 4, pp. 1–10, 2014.

N. Wadhwa, M. Rubinstein, F. Durand, and W. T. Freeman, “Phase-based video motion processing,” ACM Transactions on Graphics (TOG). (Proc. SIGGRAPH), vol. 32, no. 4, Art. no. 80, pp. 80:1–80:10, 2013, doi: 10.1145/2461912.2461966.

A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision. Cambridge, MA: MIT Press, 2024.

E.P.Simoncelli,W.T.Freeman,E.H.Adelson,andD.J.Heeger“Shiftable multiscale transforms,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 587– 607, Mar. 1992.

J. Portilla and E. P. Simoncelli, “A parametric texture model based on joint statistics of complex wavelet coefficients,” International Journal of Computer Vision, 2000.

J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc. Am. A, vol. 2, no. 7, pp. 1160–1169, 1985, doi: 10.1364/JOSAA.2.001160.

W.T.FreemanandE.H.Adelson,“Thedesignanduseofsteerablefilters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 9, pp. 891–906, 1991.

D. J. Fleet and A. D. Jepson, “Computation of component image velocity from local phase information,” International Journal of Computer Vision, vol. 5, no. 1, pp. 77–104, 1990.

B.D.LucasandT.Kanade,“Aniterativeimageregistrationtechniquewith an application to stereo vision,” in Proc. DARPA Image Understanding Workshop, 1981, pp. 121–130.

P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. COM-31, no. 4, pp. 532–540, Apr. 1983.

А. Б. Сергиенко, Цифровая обработка сигналов: учебник для вузов. 3-е изд. СПб.: БХВ-Петербург, 2011.

Р. Гонсалес и Р. Вудс, Цифровая обработка изображений, изд. 3-е, испр. и доп. М.: Техносфера, 2019.

В. А. Сойфер, Методы компьютерной обработки изображений. М.: Физматлит, 2003.

В. В. Старовойтов и А. А. Голуб, Цифровая обработка изображений и компьютерная графика. Минск: ОИПИ НАН Беларуси, 2020.

C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423 and 623–656, Jul./Oct. 1948.

В. А. Котельников, “О пропускной способности “эфира” и проволоки в электросвязи,” Материалы к I Всесоюзному съезду по вопросам технической реконструкции дела связи и развития слаботочной промышленности (по радиосекции), М.: Управление связи РККА, 1933, с. 1–19.

C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320–327, Aug. 1976, doi: 10.1109/TASSP.1976.1162830.

J.R.Janesick, ScientificCharge-CoupledDevices.Bellingham,WA,USA: SPIE Press, 2001, doi: 10.1117/3.374903.

M. Meingast, C. Geyer, and S. S. Sastry, “Geometric models of rolling- shutter cameras,” arXiv preprint cs/0503076, 2005.

Программная реализация по материалу статьи на С++ https://gitverse.ru/rremca/Gabor-Visual-Microphone

Published
2025-10-01
How to Cite
Beskonchin, M., & Spiridonov, A. (2025). Investigation of the Efficiency of the Phase-Based Method Based on Gabor Filter Banks for Audio Recovery from Video Recordings. Computer Tools in Education, (3). https://doi.org/10.32603/2071-2340-2025-3-2
Section
Algorithmic mathematics and mathematical modelling