0% Complete
فارسی
Home
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Authors :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
Keywords :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
Abstract :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
Papers List
List of archived papers
Kalman Filter–Based Anomaly Detection for User Authentication Failures in Enterprise Logs
Somayeh Soltani - Hossein Nikdel
Adaptive Semantic Communication for Non-Terrestrial Networks
Soroosh Miri - Sepehr Abolhasani - S. Mohammad Razavizadeh
ارائه راهکاری جهت مقابله با حملات DoS در شبکه های نرم افزارمحور
ویدا هاشمی - احمد بختیاری شهری - رضا جاویدان
Secure Mutual Authentication and Key Agreement Protocol for IoT
Mostafa Sadeghi
Explainable AI for Medical Image Diagnosis Using Hybrid Attention-CAM Mechanisms
Negin Amirzadeh
آسیب شناسی استقرار بلاکچین در صنعت بانکی کشور ایران
نیلوفر مرادحاصل
طبقه بندی روش های شناسایی داده های تکراری در جهت تسهیل فرایند پاکسازی داده ها
مهدی جعفری - احمد عبدالله زاده بار فروش
A perceptual loss for screen content image super-resolution
Hossein Sekhavaty-Moghadam - Marzieh Hosseinkhani - Dr Azadeh Mansouri
تحلیل کتابسنجی از مقالات حوزه دوقلوهای دیجیتال
فاطمه مکی زاده - سارا صراف - مصطفی شیرالی
Attention-Enhanced Ensemble Learning for Automated Stenosis Detection in X-ray Coronary Angiography Videos
Marzieh Sadat Hosseini - Ahmad R. Naghsh-Nilchi - Mehran Safayani - Masoumeh Sadeghi
more
Samin Hamayesh - Version 42.5.2