Machine reading comprehension (MRC) is an AI's ability to understand specific knowledge embedded in different kinds of sources - a necessary skill for many real-world scenarios. For example, in search applications, it helps AIs to give a precise answer rather than a URL for a webpage that contains it, and in the future, MRC could even help a doctor find information among thousands of documents, decreasing a time-consuming task and potentially improving the healthcare sector.
But current machine reading systems are usually built on supervised training data, which means they are trained using not only the articles they are supposed to understand, but also manually labeled questions about those articles with the corresponding answers. Such an approach is not scalable, though, because the labeling process must be done for any and each domain of knowledge. For example, in the case of an AI built to help doctors, it would be necessary to create one MRC for each disease, each of which should be constantly updated due to the ever-increasing number of articles being produced in the literature.
Enter SkyNet SynNet, Microsoft's new “two stage synthesis network" model for training MRCs. SynNet first learns key knowledge points, or semantic concepts, from one domain based on the supervised data available for it. Then, it learns to form its own natural language questions around these potential answers within the context of a given article, as can be seen in the example below.
But the most interesting aspect of SynNet is that, once trained, it can be used in a new domain to generate pseudo questions and answers for a given article. Such an approach enables it to create the supervised training data required for training specific MRCs, which removes the humanly intangible task of manually labeling questions and makes SynNet sort of an AI teacher.
And as can be seen from the chart below, SynNet, when trained on Wikipedia articles (SQuAD), performs almost as well on the news articles (NewsQA domain) as a system fully trained on it.
Even though SynNet is still in its infancy, full reading comprehension is a necessary skill for AIs to achieve the utmost goal of general intelligence. Of course, not everyone is glad about the directions AI is currently taking, with Elon Musk warning US Governors they need to regulate AI "before it's too late".
Source and images: Microsoft Research Blog via MSPU
6 Comments - Add comment