0% Complete
فارسی
Home
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Authors :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
Keywords :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
Abstract :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
Papers List
List of archived papers
Distributed Learning Automata-based Algorithm for Finding K-Clique in Complex Social Networks
Mohammad Mehdi Daliri Khomami - Alireza Rezvanian - Ali Mohammad Saghiri - Mohammad Reza Meybodi
AN EFFICIENT TASK SCHEDULING IN CLOUD COMPUTING BASED ON ACO ALGORITHM
Zahra Shafahi - Dr Alireza Yari
Heart Sound Classification based on Group-based Sparse Features of PCG Signal
Zahra Hossein-Nejad - Mehdi Nasri
A clonal selection mechanism for load balancing in the cloud computing system
Melika Mosayyebi - Reza Azmi
مکانیابی خطاهای کاربردها و خدمات نرمافزاری با کمک تولید داده آزمون با نامتغیرهای محتمل
محمد نصرتی مقدم - حسن حقیقی - مجتبی وحیدی اصل
تخلیهی باری وظایف اینترنت اشیاء بر روی مه محاسباتی با استفاده از الگوریتم حشره آبسوار
عفت تقی زاده بیلندی - آرش دلداری - علیرضا صالحان
Web Service Ranking based on QoS and Use Prefer
Seyed Hossein Siadat - Danial Ramezani - Fatemeh Ahani
Establishing security using cryptography and biometric authentication to counter cyber-attacks
Mohammed ADIL AKABR - Mehdi Hamidkhani - Mostafa Sadeghi
ParaKavosh: A Parallel Algorithm for Finding Biological Network Motifs
Dr Zahra Razaghi Moghadam Kashani - Dr Ali Masoudi-nejad - Dr Abbas Nowzari-dalini
شکلدهی سه بعدی پرتو و بهبود نرخ امن در شبکههای مخابراتی بیسیم-تواندادهشده مبتنی بر صفحات بازتابی هوشمند
کوثر انصاری - دکتر مهدی مجیدی
Samin Hamayesh - Version 40.3.1