MIT researchers have developed a “semantic parser” that learns through observation to more closely mimic a child’s language-acquisition process, which could greatly extend computing’s capabilities.
In computing, learning language is the task of syntactic and semantic parsers. These systems are trained on sentences annotated by humans that describe the structure and meaning behind words. Parsers are becoming increasingly important for web searches, natural-language database querying, and voice-recognition systems such as Alexa and Siri. Soon, they may also be used for home robotics.
But gathering the annotation data can be time-consuming and difficult for less common languages. Additionally, humans don’t always agree on the annotations, and the annotations themselves may not accurately reflect how people naturally speak.
In a paper being presented at this week’s Empirical Methods in Natural Language Processing conference, MIT researchers describe a parser that learns through observation to more closely mimic a child’s language-acquisition process, which could greatly extend the parser’s capabilities. To learn the structure of language, the parser observes captioned videos, with no other information, and associates the words with recorded objects and actions. Given a new sentence, the parser can then use what it’s learned about the structure of the language to accurately predict a sentence’s meaning, without the video.
This “weakly supervised” approach — meaning it requires limited training data — mimics how children can observe the world around them and learn language, without anyone providing direct context. The approach could expand the types of data and reduce the effort needed for training parsers, according to the researchers. A few directly annotated sentences, for instance, could be combined with many captioned videos, which are easier to come by, to improve performance.
In the future, the parser could be used to improve natural interaction between humans and personal robots. A robot equipped with the parser, for instance, could constantly observe its environment to reinforce its understanding of spoken commands, including when the spoken sentences aren’t fully grammatical or clear. “People talk to each other in partial sentences, run-on thoughts, and jumbled language. You want a robot in your home that will adapt to their particular way of speaking … and still figure out what they mean,” says co-author Andrei Barbu, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM) within MIT’s McGovern Institute.
Read more about it here