Skip to yearly menu bar Skip to main content


Morning Poster
in
Workshop: Artificial Intelligence & Human Computer Interaction

HateXplain2.0: An Explainable Hate Speech Detection Framework Utilizing Subjective Projection from Contextual Knowledge Space to Disjoint Concept Space

Md Fahim · Md Shihab Shahriar · Sabik Irbaz · Syed Ishtiaque Ahmed · Mohammad Ruhul Amin


Abstract:

Finetuning large pre-trained language models on specific datasets is a popular approach in (Natural Language Processing) NLP classification tasks. However, this can lead to overfitting and reduce model explainability. In this paper, we propose a framework that uses the projection of sentence representations onto task-specific conceptual spaces for improved explainability. Each conceptual space corresponds to a class and is learned through a transformer operator optimized during classification tasks. The dimensions of the concept spaces can be trained and optimized. Our framework shows that each dimension is associated with specific words which represent the corresponding class. To optimize the training of the operators, we introduce intra- and inter-space losses. Experimental results on two datasets demonstrate that our model achieves better accuracy and explainability. On the HateXplain dataset, our model shows at least a 10\% improvement in various explainability metrics.

Chat is not available.