Context-Aware Offensive Meme Detection: A Multi-Modal Zero-Shot Approach with Caption-Enhanced Classification
Date
2024Metadata
Show full item recordAbstract
Detecting offensive content in memes is a pressing issue, particularly as harmful and toxic materials proliferate on social media platforms. Conventional approaches to offensive meme detection typically focus on analyzing either the visual or textual components in isolation, often missing the nuanced context that arises from the interaction between different modalities. This paper presents an advanced multi-modal zero-shot classification method for offensive meme detection, utilizing large language models (LLMs) alongside image captions generated by the BLIP model. These captions provide crucial contextual information, improving the detection of offensive content, especially in cases where the meme's text or image alone may be insufficient to convey the full meaning. By integrating these captions into the classification prompt, the proposed method offers a more detailed and accurate analysis of meme content. Additionally, the use of Chain-of-Thought (CoT) prompting enhances the reasoning capabilities of the LLMs, enabling a deeper understanding of the relationship between text, images, and captions. Experimental evaluations on the GOAT-Bench and Memotion 2 datasets demonstrate that this approach consistently surpasses traditional methods that omit image captions, highlighting its efficacy in improving the precision and robustness of offensive meme classification across multiple modalities.
Collections
- Computer Science & Engineering [2518 items ]

