0% Complete
English
صفحه اصلی
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
نویسندگان :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
کلمات کلیدی :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
چکیده :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
لیست مقالات
لیست مقالات بایگانی شده
Design and modeling of a waiter robot
Amin Mohammadnejad - Hami Tourajizadeh
A Model-Driven Approach for Automatic Generation of Android Tourism Applications
Sara Adib - Bahman Zamani
شناسایی حسابهای چندکاربره بر اساس ویژگیهای شخصیتی کاربران در پلتفرمهای پخش فیلم
مهسا رضائی - مرجان کائدی
TDO-SA-PINN: A Co-Evolutionary Framework for Physics-Informed Neural Networks
SeyedMohammadReza AhmadEnjavi - Masoud Shafiee
Secure Web-Based Control of ROS 1 Robots Using AES-256-GCM Encryption and LLM Integration
Ali Godarzvand chegini - Mohammad Arabian
یک روش خوشه بندی گره ها برای شبکه های حسگر بیسیم با هدف بهبود متوازن سازی بار مبتنی بر تکنیک تاپسیس
راضیه حسین رضایی - فهیمه یزدان پناه
Design and Simulation of a New Multiplexer with Energy Analysis in Quantum Cellular Automata Technology
- - -
Movable Antenna Design for UAV-Aided Federated Learning via Deep Reinforcement Learning
MOHSEN Ahmadzadeh - Saeid Pakravan - Ghosheh Abed Hodtani
A Hybrid Crow Search and Penguin Optimization Algorithm (CPMM) for Efficient Cloud Workflow Scheduling
Reza Akraminejad - Farhad Kazemipour - Mozhdeh Koreh Davoodi
An Improved Drone Detection Method Using Deep Learning for Augmentation Detection Speed
Mohammad Bahrami - Seyyed Amir Asghari - Mohammadreza Binesh Marvasti - Sajjad Ansaria
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 43.8.0