View on GitHub

Semantic-Alignment-for-Hierarchical-Image-Captioning

Author:Sidi Lu, Zhiyong Fang, Peiyao Sheng

Abstract

Inspired by recent progress of hierarchical reinforcement learning and adversarial text generation, we introduce a hierarchical adversarial attention based model to generate natural language description of images. The model automatically learns to align the attention over images and subgoal vectors in the process of caption generation. We describe how we can train, use and understand the model by showing its performance on Flickr8k. We also visualize the subgoal vectors and attention over images during generation procedures.

Authors


Sidi Lu	Zhiyong Fang	Peiyao Sheng

Demo

Code

We provide source code on Github, including:

1. Train/Test code.

2. Visualization tool for attention mechanism.

Sample Usage

Our model can handle COCO, Flickr8k and Flickr30k dataset. For simplicity, we only present Flickr8k here.

1. Create folder ./code/dataset

2. Download processed Flickr8k Image Captioning Dataset from here with key: sh4u

3. Unzip the downloaded file in ./code/dataset/

4. Download resnet50 model file in ./code/saved_model/ from here with key: h712

4. Run ./code/main.py with python3

Paper

Our paper is available here

Bibtex

@article{Lu2018SemanticAlignment,
          title={Semantic Alignment for Hierarchical Image Captioning},
          author={Lu, Sidi and Fang, Zhiyong and Sheng, Peiyao},
          year={2018},
          howpublished={\url{https://github.com/zhiyong1997/Semantic-Alignment-for-Hierarchical-Image-Captioning}}
        }