3M: MULTI-STYLE IMAGE CAPTION GENERATION USING MULTI-MODALITY FEATURES UNDER MULTI-UPDOWN MODEL

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

Blog Article

In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality image features, ResNeXt features, and text features generated by DenseCap.We propose the 3M model, a Multi-UPDOWN caption model that encodes multi-modality features Standing Wooden Plaque and decodes them into captions.We demonstrate the effectiveness of our model on generating human-like captions by examining its performance on two datasets, the PERSONALITY-CAPTIONS dataset, Leather Cleaner and the FlickrStyle10K dataset.We compare against a variety of state-of-the-art baselines on various automatic NLP metrics such as BLEU, ROUGE-L, CIDEr, SPICE, etc ootnote{code will be available at https://github.

com/cici-ai-club/3M}.A qualitative study has also been done to verify our 3M model can be used for generating different stylized captions.

Report this page