Cracking the Code of Juxtaposition:
Can AI Models Understand the Humorous Contradictions







Abstract


Recent advancements in large multimodal language models have demonstrated remarkable proficiency across a wide range of tasks. Yet, these models still struggle with understanding the nuances of human humor through juxtaposition, particularly when it involves nonlinear narratives that underpin many jokes and humor cues. This paper investigates this challenge by focusing on comics with contradictory narratives, where each comic consists of two panels that create a humorous contradiction. We introduce the YESBUT benchmark, which comprises tasks of varying difficulty aimed at assessing AI’s capabilities in recognizing and interpreting these comics, ranging from literal content comprehension to deep narrative reasoning. Through extensive experimentation and analysis of recent commercial or open-sourced large (vision) language models, we assess their capability to comprehend the complex interplay of the narrative humor inherent in these comics. Our results show that even state-of-the-art models still lag behind human performance on this task. Our findings offer insights into the current limitations and potential improvements for AI in understanding human creative expressions.






YESBUT Dataset Overview


Explanation of Dataset.
Our benchmark consists of YESBUT comics featuring contradictory narratives. Specifically, each sample includes:
(1) a two-panel comic that forms a narrative with inherent contradictions;
(2) a literal description of the comic narratives;
(3) an explanation that illustrates the contradiction within the narrative;
(4) the deep philosophy or underlying message the comic aims to convey;
(5) a title of the comic.
Based on these components, we construct various tasks for comic understanding.






Data Construction Overview


Framework of Data Construction. For each comic, we annotate the corresponding literal description, contradiction explanation, underlying philosophy and comic title. We primarily rely on human annotators to obtain gold-standard annotations. Our annotation process included two stages: the progressive human-AI collaborative annotation stage and the quality check and cross-verification stage. See in our figure.


overview






Tasks


Do Large Models Understand Humor in Juxtaposition?
We aim to evaluate the capabilities of recent large (visual) language models in understanding humor through contradictions. This is challenging because it requires both social reasoning about human events and nonlinear logical reasoning about the narratives, going beyond the literal understanding of the comic. We design a series of tasks that require different levels of narrative understanding and reasoning abilities to evaluate the models’ performance in reading comics.











Potential applications of the dataset


Evaluation: As a benchmark, this dataset can be used to evaluate the reasoning ability, comic understanding and humor understanding ability of a Vision Language Model. The following result is the how we evaluate the humor understanding ability of VLMs in our paper.





Generative task: In the future, we intend to explore more deeply how AI can creatively engage with content. This includes generating pivotal turning points from one perspective and creating counterpoints to given scenarios, like generating a "But" image by given "Yes". The following is a simple example of it.







VLM image understanding: VLM Image Understanding: We will explore in more depth how VLM understands these images and how to improve VLM’s ability to understand these humorous images. We can address the hallucinations in the samples by improving the model’s reasoning ability and improve VLM’s understanding of the deep semantics of the images.












Ethics Statement


Copyright and License. All data samples collected are sourced from publicly available content on social media platforms. We ensure compliance with copyright by utilizing original links to comics without infringement. Additionally, we commit to open-sourcing our annotated benchmark, providing corresponding links to each comic image. We diligently review samples, filtering out potentially offensive or harmful content.

Citation


AخA
 

If you find our work helpful, please consider cite us:

@article{2024cracking,
    title={Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions},
    author={Zhe Hu, Tuo Liang, Jing Li, Yiren Lu, Yunlai Zhou, Yiran Qiao, Jing Ma, Yu Yin},
    journal={arXiv preprint arXiv:2405.19088},
    year={2024}



The comics on the website are created by Artist Gudim.