0% Complete
فارسی
Home
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Authors :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
Keywords :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
Abstract :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
Papers List
List of archived papers
An Improved Image Classification Based In Feature Extraction From Convolutional Neural Network: Application To Flower Classification
Faeze Sadati - Dr Behrooz Rezaie
Improving hypergraph attention and hypergraph convolution networks
Mustafa Mohammadi Gharasuie - Mahmood Shabankhah - Ali Kamandi
شناسایی حملات فیشینگ با استفاده از الگوریتم عقاب آتشین و شبکه عصبی کانولوشن
علی کوشاری - مهدی فرتاش
Silicon photonic microring resonators: A Novel optical router based on Negative-First routing algorithm
Negin Bagheri Renani - Elham Yaghoubi
SBST challenges from the perspective of the test techniques
Sepideh Kashefi Gargari - Dr Mohammad Reza Keyvanpour
Vi-Net: A Deep Violent Flow Network for Violence Detection in Video Sequences
Tahereh Zarrat Ehsan - Seyed Mehdi Mohtavipour
Automatic identification and reconstruction of Tuberculosis in microscopic images using convolutional auto-encoder network
Ahmad Reza Nadafi - Farahnaz Mohanna
شناسایی جایگاه مالونیلاسیون در پروتئینها با بهرهگیری از استخراج ویژگی و تکنیکهای پردازش زبان طبیعی
حنانه رجبیون - محمد قاسم زاده - وحید رنجبر بافقی
Generalized Self-Attentive Spatiotemporal GCN with OPTICS Clustering for Recommendation Systems
Saba Zolfaghari - Seyed Mohammad Hossein Hasheminejad
پیشبینی حجم ترافیک شهری با استفاده از دادههای سرویس نشان مورد مطالعاتی: خیابان کمال اصفهان
مهسا لطیفی - جمشید مالکی
more
Samin Hamayesh - Version 41.3.1