Context-Aware Offensive Meme Detection: A Multi-Modal Zero-Shot Approach with Caption-Enhanced Classification
| Author | Abdullakutty, Faseela |
| Author | Al-Maadeed, Somaya |
| Author | Naseem, Usman |
| Available date | 2025-12-03T05:08:02Z |
| Publication Date | 2024 |
| Publication Name | IEEE International Conference on Data Mining Workshops, ICDMW |
| Resource | Scopus |
| Identifier | http://dx.doi.org/10.1109/ICDMW65004.2024.00025 |
| Citation | F. Abdullakutty, S. Al-Maadeed and U. Naseem, "Context-Aware Offensive Meme Detection: A Multi-Modal Zero-Shot Approach with Caption-Enhanced Classification," 2024 IEEE International Conference on Data Mining Workshops (ICDMW), Abu Dhabi, United Arab Emirates, 2024, pp. 137-145, doi: 10.1109/ICDMW65004.2024.00025. |
| ISBN | 979-833153063-1 |
| ISSN | 23759232 |
| Abstract | Detecting offensive content in memes is a pressing issue, particularly as harmful and toxic materials proliferate on social media platforms. Conventional approaches to offensive meme detection typically focus on analyzing either the visual or textual components in isolation, often missing the nuanced context that arises from the interaction between different modalities. This paper presents an advanced multi-modal zero-shot classification method for offensive meme detection, utilizing large language models (LLMs) alongside image captions generated by the BLIP model. These captions provide crucial contextual information, improving the detection of offensive content, especially in cases where the meme's text or image alone may be insufficient to convey the full meaning. By integrating these captions into the classification prompt, the proposed method offers a more detailed and accurate analysis of meme content. Additionally, the use of Chain-of-Thought (CoT) prompting enhances the reasoning capabilities of the LLMs, enabling a deeper understanding of the relationship between text, images, and captions. Experimental evaluations on the GOAT-Bench and Memotion 2 datasets demonstrate that this approach consistently surpasses traditional methods that omit image captions, highlighting its efficacy in improving the precision and robustness of offensive meme classification across multiple modalities. |
| Language | en |
| Publisher | IEEE |
| Subject | LLMs Meme analysis multi-modality |
| Type | Conference |
| Pagination | 137-145 |
Files in this item
This item appears in the following Collection(s)
-
Computer Science & Engineering [2520 items ]


