Category: Project Pages

Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach Abstract Many visual scenes contain text that carries crucial information, and it is thus essential to understand text in images for downstream reasoning tasks. For example, a deep water label on a …

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA Read More »

Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko Abstract Solving grounded language tasks often requires reasoning about relationships between objects in the context of a given task. For example, to answer the question "What color is the mug on the …

Language-Conditioned Graph Networks for Relational Reasoning Read More »

Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko Abstract In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning …

Explainable Neural Computation via Stack Neural Module Networks Read More »

Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell Abstract Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify …

Speaker-Follower Models for Vision-and-Language Navigation Read More »

Ronghang Hu, Piotr Doll├ír, Kaiming He, Trevor Darrell, Ross Girshick Abstract Existing methods for object instance segmentation require all training instances to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance …

Learning to Segment Every Thing Read More »

Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko Abstract Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number …

Learning to Reason: End-to-End Module Networks for Visual Question Answering Read More »

Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko Abstract People often refer to entities in an image in terms of their relationships with other entities. For example, "the black cat sitting under the table" refers to both a …

Modeling Relationships in Referential Expressions with Compositional Modular Networks Read More »

Ronghang Hu, Marcus Rohrbach, Trevor Darrell Abstract We approach the novel problem of segmenting an image based on a natural language expression. This is different from traditional semantic segmentation over a predefined set of semantic classes, as e.g., the phrase …

Segmentation from Natural Language Expressions Read More »

Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, Trevor Darrell Abstract We address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object. …

Natural Language Object Retrieval Read More »