0% Complete
فارسی
Home
/
شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
A Deep Learning Framework for Phase-Aware Feature Representation to Improve Sound Source Direction and Distance Estimation
Authors :
Zahra Abolfazli
1
Hamid Reza Abutalebi
2
1- Yazd University
2- Yazd University
Keywords :
(Sound Event Localization and Detection (SELD،phase spectrogram features،Conformer
Abstract :
This paper proposes a novel refinement of the network’s input features to improve distance and Direction-of-Arrival estimation in the sound event localization and detection system. Instead of relying on Mel energies, we propose using phase spectrograms as input feature, which effectively preserve inter-channel time delays and capture crucial wave propagation characteristics. Furthermore, we introduce architectural improvements for increased robustness. Specifically, Huber loss replaces MSE, reducing sensitivity to noise. Additionally, MHSA layers are replaced with Conformer blocks to better model both long-range dependencies and local interactions within the audio data. Our experimental results validate the effectiveness of the proposed phase-based feature representation and optimized architecture, demonstrating improvements in both DOA and distance estimation.
Papers List
List of archived papers
آرتمیا: پروتکل مسیریابی مبتنی بر انجمن و آگاه به نظم تماس در شبکة اجتماعی متحرک تأخیرپذیر
سعید مرادی - جمشید باقرزاده محاسفی
A Multi-Task Framework Using Mamba for Identity, Age, and Gender Classification from Hand Images
Amirabbas Rezasoltani - Alireza Hosseini - Ramin Toosi - MohammadAli Akhaee
A clonal selection mechanism for load balancing in the cloud computing system
Melika Mosayyebi - Reza Azmi
Improving hypergraph attention and hypergraph convolution networks
Mustafa Mohammadi Gharasuie - Mahmood Shabankhah - Ali Kamandi
SDN-based Deep Anomaly Detection For Securing Cloud Gaming Servers
Mohammadreza Ghafari - Dr Seyed Mostafa Safavi Hemami
تشخیص بیماری شبکوری با استفاده از ترکیب الگوریتمهای یادگیری عمیق
میثم فتاحی
A Swarm Intelligence Approach to Design Optimal Repeaters in Multilayer Graphene Nanoribbon Interconnects
Majid Sanaeepur - Maryam Momeni
Sigma: A Secure Federated Network Gaming Platform
Keyhan Mohammadi - Reza Ebrahimi Atani
Open-domain question classification and completion in conversational information search
Omid Mohammadi Kia - Mahmood Neshati - Mahsa Soudi Alamdari
PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph
Romina Etezadi - Mehrnoush Shamsfard
more
Samin Hamayesh - Version 42.5.2