In a paper, researchers test the best language models (GPT-2/3/Neo) at solving programming questions from coding interviews. Results aren't particularly groundbreaking but show potential.
Bert RSS
Researchers at Google have developed BLUERT, an automatic metric that gauges the performance of natural language generation models and delivers SOTA performance on two academic benchmarks.
Researchers at MIT have created a framework—TextFooler—that brought down the prediction accuracy of certain NLP models from 90% down to under 20% by simply using synonyms in place of certain words.
By harmonizing modules pertaining to different media, Nvidia Jarvis will help build context for accurately predicting and generating responses in conversation-based AI applications.