International Conference on Information and Knowledge Technology

Home / شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش

A Framework for Systematic Stability Assessment of Post-hoc Explanations in Text Classification

Authors :

Parman Mohammadalizadeh¹ Parham Mohammadalizadeh² Ayda Mahmoudian³

1- دانشگاه زنجان 2- پژوهشگر مستقل 3- پژوهشگر مستقل

Keywords :

Explainable AI،Explainability Evaluation،Natural Language Processing

Abstract :

Post-hoc explanation methods are widely adopted for interpreting neural text classifiers, yet lack standardized evaluation of their stability under input perturbations. We present a systematic framework for assessing explanation stability through three categories of stress tests: preprocessing variations, semantic paraphrasing, and explainer seed variations. The framework combines quantitative metrics (Jaccard similarity, Spearman correlation, attribution differences) with automated stability card generation for standardized reporting. We evaluate Integrated Gradients, LIME, and SHAP across four model-dataset combinations spanning sentiment analysis and topic classification. Results reveal nuanced stability patterns, including the decoupling of model capacity from explanation reliability and architecture-dependent vulnerability to perturbation types. Our open-source implementation supports standard transformer models and explanation libraries, establishing practical stability assessment as a reproducible evaluation standard for NLP explainability research.