Improving Physiological Audio Transmission Using Spectral Balancing and Data Integration

Authors

  • Dr. Rafael Henrique Costa Department of Philological Studies Federal University of Rio de Janeiro Rio de Janeiro, Brazil
  • Dr. Mariana Alves Pereira School of Language and Literature University of São Paulo São Paulo, Brazil

Keywords:

Physiological audio transmission, bone-conducted speech, spectral balancing

Abstract

Physiological audio transmission, particularly body-conducted and non-acoustic speech signals, has emerged as a critical research domain in robust communication systems operating under adverse acoustic environments. Conventional air-conducted speech processing techniques suffer from severe degradation in high-noise conditions, motivating the exploration of alternative sensing modalities such as bone-conducted microphones, non-audible murmur (NAM) sensors, and multi-sensor fusion frameworks. However, these modalities inherently produce distorted spectral characteristics, limited bandwidth signals, and reduced intelligibility, thereby necessitating advanced enhancement strategies.

This study proposes a comprehensive analytical and technical framework for improving physiological audio transmission through the integration of spectral balancing techniques and multi-source data fusion. The research systematically examines how spectral equalization can compensate for frequency attenuation and distortion inherent in body-conducted signals while leveraging data integration strategies to reconstruct high-fidelity speech representations. Building upon prior work in multi-sensor signal processing, adaptive filtering, and statistical voice conversion, the study introduces a unified architecture that combines spectral correction with cross-modal feature alignment.

The methodological foundation of this research integrates classical signal processing approaches, such as linear predictive modeling and comb filtering, with modern machine learning-based reconstruction techniques. The proposed framework evaluates the interplay between signal enhancement and sensor-level integration, emphasizing robustness, scalability, and real-time applicability. Empirical analysis demonstrates that combining spectral balancing with sensor fusion significantly enhances intelligibility, reduces noise artifacts, and improves recognition accuracy in extreme environments.

Furthermore, the study critically analyzes limitations associated with sensor noise, alignment errors, and computational complexity. It highlights the trade-offs between signal fidelity and processing overhead while proposing optimization strategies for practical deployment. The findings contribute to the advancement of physiological speech processing by offering a structured approach to overcoming inherent limitations in non-acoustic signal modalities.

This research holds implications for applications in defense communication systems, assistive technologies, medical diagnostics, and human-computer interaction, where reliable speech transmission is essential under challenging environmental conditions.

Downloads

Download data is not yet available.

References

T. Barnwell, M. Clements, D. Anderson, E. Moore, M. Lee, A. Ertan, et al., "Low bit rate codingof speech in harsh conditions using non-acoustic auxiliary devices", Proc. Special Workshop in Maui: Lectures by Masters in Speech Process., 2004.

C. Demiroglu, D. Anderson, M. Clements and T. Barnwell, "Multi-sensor spectro-temporalcomb filtering for speech enhancement", Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 4, pp. IV-589-IV-592, 2007.

C. Demiroglu, S. Kamath, D. Anderson, M. Clements and T. Barnwell, "Segmentation-based noise suppressionfor speech coders using auxiliary sensors", Conf. Rec. 38th Asilomar Conf. Signals Syst. Comput., vol. 2, pp. 2320-2323, 2004.

S. Dupont, C. Ris and D. Bachelart, "Combined use of close-talk and throat microphonesfor improved speech recognition under non-stationary background noise", Proc. COST278 and ISCA Tutorial Res. Workshop (ITRW) Robust. Iss. Conversat. Interact., 2004.

M. Graciarena, H. Franco, K. Sonmez and H. Bratt, "Combining standard and throat microphonesfor robust speech recognition", IEEE Signal Process. Lett., vol. 10, no. 3, pp. 72-74, Mar. 2003.

P. Heracleous, Y. Nakajima, A. Lee, H. Saruwatari and K. Shikano, "Accurate hidden Markov models for non-audiblemurmur (NAM) recognition based on iterative supervised adaptation", Proc. IEEE Workshop Autom. Speech Recognition Understanding (ASRU03), pp. 73-76, 2003.

R. Hu, S. D. Kamath and D. V. Anderson, "Speech enhancement using non-acousticsensors", Proc. INTERSPEECH05, pp. 2305-2308, 2005.

S. Ishimitsu, H. Kitakaze, Y. Tsuchibushi, H. Yanagawa and M. Fukushima, "A noise-robust speech recognitionsystem making use of body-conducted signals", Acoust. Sci. Technol., vol. 25, no. 2, pp. 166-169, 2004.

Q. Jin, S. Jou and T. Schultz, "Whispering speaker identification", Proc. IEEE Int. Conf. Multimedia and Expo, pp. 1027-1030, 2007.

S. Jou, T. Schultz and A. Waibel, "Whispery speech recognitionusing adapted articulatory features", Proc. ICASSP, pp. 1009-1012, 2005.

K. Kondo, T. Fujita and K. Nakagawa, "On equalization of bone conductedspeech for improved speech quality", Proc. IEEE Int. Symp. Signal Process. Inf. Technol., pp. 426-431, 2006.

Z. Liu, A. Subramanya, Z. Zhang, J. Droppo and A. Acero, "Leakage model and teeth clack removal for air-and bone-conductiveintegrated microphones", Proc. ICASSP, vol. 1, pp. 1093-1096, 2005.

Z. Liu, Z. Zhang, A. Acero, J. Droppo and X. Huang, "Direct filtering for air-and bone-conductive microphones", Proc. IEEE 6th Workshop Multimedia Signal Process., pp. 363-366, 2004.

A. McCree, K. Brady and T. Quatieri, "Multisensor dynamic waveform fusion", Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 4, pp. IV-577, 2007.

Y. Nakajima, H. Kashioka, K. Shikano and N. Campbell, "Remodeling of the sensor fornon-audible murmur (nam)", Proc. 9th Eur. Conf. Speech Commun. Technol. (Interspeech 05—EUROSPEECH), pp. 3041-3044, 2005.

Y. Nakajima, H. Kashioka, N. Campbell and K. Shikano, "Non-audible murmur (nam) recognition", IEICE Trans. Inf. Syst., vol. 89, no. 1, pp. 1-8, 2006.

L. Ng, G. Burnett, J. Holzrichter and T. Gable, "Denoising of human speech using combinedacoustic and EM sensor signal processing", Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP00), vol. 1, pp. 229-232, 2000.

T. Quatieri, K. Brady, D. Messing, J. Campbell, W. Campbell, M. Brandstein, et al., "Exploiting nonacoustic sensors for speechencoding", IEEE Trans. Audio Speech Lang. Process., vol. 14, no. 2, pp. 533-544, Mar. 2006.

E. Ruzanski, J. Hansen, D. Finan, J. Meyerhoff, W. Norris and T. Wollert, "Improved teo feature-based automatic stressdetection using physiological and acoustic speech sensors", Proc. 9th Eur. Conf. Speech Commun. Technol., 2005.

M. Scanlon, "Acoustic monitoring pad", Proc. IEEE 17th Annu. Conf. Eng. Med. Biol. Soc., vol. 2, pp. 1725-1726, 1995.

A. Shahina and B. Yegnanarayana, "Language identification innoisy environments using throat microphone signals", Proc. Int. Conf. Intell. Sens. Inf. Process., pp. 400-403, 2005.

T. Shinamura and T. Tomikura, "Quality improvement of bone-conductedspeech", Proc. Eur. Conf. Circuit Theory Design, vol. 3, pp. III-73, 2005.

T. Shimamura, J. Mamiya and T. Tamiya, "Improving bone-conducted speechquality via neural network", Proc. IEEE Int. Symp. Signal Process. Inf. Technol., pp. 628-632, 2006.

O. Strand, T. Holter, A. Egeberg and S. Stensby, "On the feasibility of asr inextreme noise using the parat earplug communication terminal", Proc. IEEE Workshop Autom. Speech Recognition Understanding (ASRU03), pp. 315-320, 2003.

T. Tamiya and T. Shimamura, "Reconstruction filter designfor bone-conducted speech", Proc. ICSLP 04, vol. 2, pp. 1085-1088, 2004.

T. Toda, K. Nakamura, H. Sekimoto and K. Shikano, "Voice conversion for varioustypes of body transmitted speech", Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 3601-3604, 2009.

T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima and K. Shikano, "Technologies for processing body-conducted speech detectedwith non-audible murmur microphone", Proc. INTERSPEECH09, pp. 632-635, 2009.

T. Toda, M. Nakagiri and K. Shikano, "Statistical voice conversiontechniques for body-conducted unvoiced speech enhancement", IEEE Trans. Audio Speech Lang. Process., vol. 20, no. 9, pp. 2505-2517, Sep. 2012.

T. tat Vu, M. Unoki and M. Akagi, "A study on an lp-based model for restoringbone-conducted speech", Proc. 1st Int. Conf. Commun. Electron. (ICCE06), pp. 294-299, 2006.

T. Vu, G. Seide, M. Unoki and M. Akagi, "Method of lp-based blind restoration for improving intelligibilityof bone-conducted speech", Proc. Interspeech 07, pp. 966-969, 2007.

J. Yu, L. Zhang and Z. Zhou, "A novel voice collection scheme based on bone-conduction", Proc. IEEE Int. Symp. Commun. Inf. Technol. (ISCIT), vol. 2, pp. 1164-1168, 2005.

Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, et al., "Air-and bone-conductive integratedmicrophones for robust speech detection and enhancement", Proc. IEEE Workshop Autom. Speech Recognit. Understand. (ASRU03), pp. 249-254, 2003.

Downloads

Published

2026-04-01