Speaker-Follower Models for Vision-and-Language Navigation

Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell Abstract Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify

Learning to Segment Every Thing

Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick Abstract Existing methods for object instance segmentation require all training instances to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance

Learning to Reason: End-to-End Module Networks for Visual Question Answering

Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko Abstract Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number

Modeling Relationships in Referential Expressions with Compositional Modular Networks

Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko Abstract People often refer to entities in an image in terms of their relationships with other entities. For example, "the black cat sitting under the table" refers to both a

Segmentation from Natural Language Expressions

Ronghang Hu, Marcus Rohrbach, Trevor Darrell Abstract We approach the novel problem of segmenting an image based on a natural language expression. This is different from traditional semantic segmentation over a predefined set of semantic classes, as e.g., the phrase

Natural Language Object Retrieval

Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, Trevor Darrell Abstract We address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object.