0% Complete
فارسی
Home
/
شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Enhancing Persian Speech Emotion Recognition with Contrastive Learning and Multimodal Fusion
Authors :
Mobina Esmaeili
1
Vajiheh Sabeti
2
1- دانشگاه الزهرا(س)
2- دانشگاه الزهرا(س)
Keywords :
Multimodal Emotion Recognitiont،Representation Learning،Representation Learning،Speech-Text Fusion،ShEMO Dataset
Abstract :
Emotion recognition from both speech and text in low-resource languages such as Persian presents significant challenges due to linguistic complexity and the scarcity of labeled datasets. Conventional multimodal fusion methods often struggle to capture nuanced cross-modal interactions and typically neglect inter-class emotional relationships. To address these limitations, this paper introduces a novel contrastive learning framework that employs pre-trained projection networks to enhance multimodal representations through a combination of intra-modal, inter-modal, and semi-contrastive objectives. The refined embeddings are integrated via a lightweight fusion layer for final emotion classification. In addition, an automatic speech recognition (ASR) system is incorporated to enrich textual inputs and improve linguistic diversity. Experiments on the ShEMO corpus demonstrate that the proposed approach achieves an accuracy of 83.04% and an unweighted average recall (UAR) of 88.1%, substantially outperforming traditional fusion-based baselines. These results confirm the effectiveness of the framework in improving cross-modal alignment and representation quality, highlighting its potential for intelligent interactive systems, social media sentiment analysis, and automated affective computing applications.
Papers List
List of archived papers
سیستم تشخیص نفوذ مبتنی برشبکه عصبی کانولوشن برای تشخیص حمله انکارسرویس در اینترنت وسایل نقلیه
زهرا جانفدا - سید امین حسینی سنو
Wireless Virtual-Reality by considering Hybrid Beamforming in IEEE802.11ay standard
Nasim Alikhani - Abbas Mohammadi
Identifying Children's Personality Styles through Drawing Analysis using Machine Learning
Maedeh Mosharraf - Faezeh Banabazi
Design of low-latency Floating-Point units for Softmax Computation in Transformer-based Large Language Models
Hoda Ghabeli - Amir Sabbagh Molahosseini
پیش بینی گره های رهبر در شبکه های اجتماعی با استفاده از پیش بینی پیوند
روح اله رشیدی - فرساد زمانی بروجنی - محمد رضا سلطان آقایی - هادی فرهادی
An Eco-Friendly Cosmopolitan (EFC) by Recycling Scientific/Industrial Towns (RSITs)
Engineer Reza Khalilian - Dr. Abdalhossein Rezai - Dr. Mohammadreza Talakesh
Stock Market Prediction Using Hard and Soft Data Fusion
Saeed Mohammadi Dashtaki - Masoud Alizadeh - Behzad Moshiri
User Preferences Elicitation in Bilateral Automated Negotiation Using Recursive Least Square Estimation
Farnaz Salmanian - Dr Hamid Jazayeri - Dr Javad Kazemitabar
Beyond One-Hot: CatBoost for Heating and Cooling Load Prediction
Shayan Naghizadeh - Mohammad Saeed Rajabi - Ehsan Nazerfard
طراحی واسط کاربری مبتنی بر رفتار و احساسات کاربران در سیستم های هوشمند
فاطمه صبائی - دکتر احمد عبداله زاده بارفروش
more
Samin Hamayesh - Version 42.5.2