0% Complete
English
صفحه اصلی
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
نویسندگان :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
کلمات کلیدی :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
چکیده :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
لیست مقالات
لیست مقالات بایگانی شده
ارائه راهکاری جهت مقابله با حملات DoS در شبکه های نرم افزارمحور
ویدا هاشمی - احمد بختیاری شهری - رضا جاویدان
ارائۀ چارچوب هستانشناسی برای شهر هوشمند مبتنی بر سیستمهای سایبر-فیزیکی
علی اصغر قائمی - جعفر حبیبی - سید حسن میریان
Presenting an Edge-based Air Quality Management System for Smart City Scenarios
Tina Samizadeh Nikoui - Ali Balador - Amir Masoud Rahmani - Hooman Tabarsaied
Improving hypergraph attention and hypergraph convolution networks
Mustafa Mohammadi Gharasuie - Mahmood Shabankhah - Ali Kamandi
LuckyAgent2022: A Stop-Learning Multi-Armed Bandit Automated Negotiating Agent
Arash Ebrahimnezhad - Faria Nassiri-Mofakham
Knowledge Graph Based Retrieval-Augmented Generation for Multi-Hop Question Answering Enhancement
Mahdi Amiri Shavaki - Pouria Omrani - Ramin Toosi - Mohammad Ali Akhaee
ISAAF: بهبود چارچوب مجوز خودتطبیق SAAF با استفاده از پیادهسازی مبتنی بر عامل و مفهوم I-Shairing
الهام معین الدینی - دکتر منیره عبدوس - دکتر اسلام ناظمی
Dealing with Black-hole Attacks in Inter-vehicle Networks Using the Packet Delivery Rate Algorithm
Marzieh Sedighi - Mehdi Hamidkhani - Mostafa Sadeghi
LLM-Driven Feature Extraction for Stock Market Prediction: A case study of Tehran Stock Exchange
Siavash Hosseinpour Saffarian - Saman Haratizadeh
Enhancing QSAR Modeling: A Fusion of Sequential Feature Selection and Support Vector Machine
Farzaneh Khajehgili-Mirabadi - Mohammad Reza Keyvanpour
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.5.2