Ph.D. Research Proposal Exam: Anton Jeran Ratnarajah

Wednesday, May 17, 2023
1:00 p.m.
AVW 2328
Maria Hoo
301 405 3681
mch@umd.edu

ANNOUNCEMENT: Ph.D. ResearchProposal Exam

 

Name: Anton Jeran Ratnarajah

  

Committee:

Professor Dinesh Manocha (Chair)

Professor Carol Espy-Wilson

Professor Ramani Duraiswami

Date/time: Wednesday, May 17, 2023 at 1:00pm - 3:00pm

 

Location: AVW 2328

 

Title: Efficient learning-based sound propagation for virtual and real-world audio processing applications

 

 Abstract:

Sound propagation is the process of transferring sound energy released by the speaker in the form of sound waves to the environment through the air. The way the sound waves propagate from the speaker to the listener in the environment is characterized by a transfer function known as room impulse response (RIR). The RIR depends on the geometry of the acoustic environment and the materials present in the acoustic environment. Physics-based acoustic simulators have been used over the decades to generate high-quality RIR for a given acoustic environment.

 

These acoustic simulators are not capable of generating high-quality RIRs at interactive rates and cannot be directly controlled using non-conventional parameters. Also, there are significant differences in energy distribution between the RIR generated using physics-based approaches and the RIRs measured from the real-world environment. To generate RIR we require the 3D representation of the underlying scene and complete knowledge of the material present in the acoustic environment. 

 

To address the problems of existing acoustic simulators, we propose 3 solutions. To generate high-quality RIRs at interactive rates, we propose a learning-based room impulse response generator. Our learning-based approach can be trained to directly control using non-conventional input parameters like reverberation time. The learning-based approach can generate 10,000 RIRs per second on an NVIDIA GeForce RTX 2080 Ti GPU for a given furnished indoor 3D scene.  Our model can generate both monaural impulse responses (IRs) and binaural IRs for both reconstructed 3D scenes and synthetic 3D scenes. We have extensively evaluated the benefits of our learning-based RIR generation approach in speech applications. We also performed the perceptual evaluation and observed that the audio rendered using our learning-based approach is more plausible as compared to audio rendered using prior learning-based RIR generators.

 

To bridge the gap between the synthetic RIRs from physics-based simulators and measured RIRs from the real-world environment, we propose TS-RIRGAN architecture. Our TS-RIRGAN translates the energy distribution of synthetic RIRs similar to the measured RIRs. We use a set of acoustic parameter values to measure the quality of translated RIRs. We also show the benefit of bridging the gap between synthetic RIR and real RIR using automatic speech recognition (ASR) experiments.

 

Finally, we propose an alternative approach for estimating RIRs from the reverberant speech signal, in the absence of a 3D representation of the real-world environment. Estimating RIRs from the reverberant speech signal captured using home voice assistants devices enables augmenting more speech training data similar to the test environment and helps to improve the performance of ASR systems in the test environment.


Audience: Faculty 

remind we with google calendar

 

May 2025

SU MO TU WE TH FR SA
27 28 29 30 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
1 2 3 4 5 6 7
Submit an Event