Maryam Zare's PhD Thesis Defense
Maryam successfully defended her thesis defense on 14th June 2021. The online copy of the thesis can be found online using the link: https://etda.libraries.psu.edu/catalog/22618muz50
Title: AN AGENT LEARNING DIALOGUE POLICIES FOR SENSING PERCEIVING AND LEARNING THROUGH MULTI-MODAL COMMUNICATION
Abstract: Language communication is an important part of human life and a natural and intuitive way of learning new things. It is easy to imagine intelligent agents that can learn through communication to for example, help us in rescue scenarios, surgery, or even agriculture. As natural as learning through language is to humans, developing such agents has numerous challenges: Language is ambiguous, and humans convey their intentions in different ways using different words. Tasks have different learning goals and some are more complex than others. Additionally, humans differ in their communicative skills, particularly, in how much information they share or know. Thus, the agent must be able to learn from a wide range of humans and to adapt to different knowledge goals.
This work proposes SPACe, a novel dialogue policy that supports Sensing, Perceiving, and Acquiring knowledge through Communication. SPACe communicates using natural language, which is translated to an unambiguous meaning representation language (MRL). The MRL supports formulating novel, context-dependent questions (e.g. "wh-" questions). SPACe is a single adaptive policy for learning different tasks from humans who differ in informativeness. Policies are modeled as a Partially Observable Markov Decision Process (POMDP) and are trained using reinforcement learning. Adaptation to humans and to different learning goals arises from a rich state representation that goes beyond dialogue state tracking, to allow the agent to constantly sense the joint information behavior of itself and its partner and adjust accordingly, a novel reward function that is defined to encourage efficient questioning across all tasks and humans, and a general- purpose and extensible MRL. As the cost of training POMDP policies with humans is too high to be practical, SPACe is trained using a simulator. Experiments with human subjects show that the policies transfer well to online dialogues with humans.
We use games as a testbed, and store the knowledge in a game tree. Games are similar to real-world tasks: families of related games vary in complexity as do related real-world tasks, and present a problem-solving task where the state changes unpredictably due to the actions of others. Game trees are a well-studied abstraction for representing game knowledge, reasoning over knowledge, and for acting on that knowledge during play. We have tested our agent on several board games, but the methodology applies to a wide range of other games. The agent’s learning ability is tested in a single dialogue and across a sequence of two dialogues. The latter is particularly important for learning goals that are too complex to master in one dialogue. Tests of the agent to learn games not seen in training show the generality of its communication abilities. Human subjects found the agent easy to communicate with, and provided positive feedback, remarking favorably on its ability to learn across dialogues "to pull in old information as if it has a memory".
Comments
Post a Comment