One of the biggest problems for Wikipedia, according to Denny Vrandečić, founder of Wikidata, is making quality articles available across the sheer number of languages supported by the online encyclopedia. In a new paper, Vrandečić has floated a new idea that would allow contributors to create content using abstract notation which could then be translated to different natural languages, balancing out content more evenly, no matter the language you speak.
In the paper, Vrandečić suggests a project called Abstract Wikipedia would be used by anyone in the world to enter information as abstract notation, then a tool called Wikilambda would host a collection of functions that could turn the notation into natural language text. The result would be that all versions of Wikipedia, no matter the language, would be closer to the scale of the English Wikipedia in terms of content.
The 22-page paper goes into a fair bit of detail about the proposal. The basic idea of how the system would work is by giving things in the sentence you want to write a Wikidata identifier.
As an example, the Wikidata identifier for San Francisco is Q62, Northern California is Q1066807, California is Q99, Los Angeles is Q65, and San Diego and San Jose are Q16552 and Q16553 respectively. A pseudo-example to help you understand what happening could look like this:
“Q62 is the cultural, commercial, and financial center of Q1066807. It is the fourth-most populous city in Q99 after Q65, Q16552, and Q16553.”
The actual notation would be more complex than this but the example helps you get a better grasp of the concept. The software would use renderers alongside the Wikidata information to generate content in different languages.
Vrandečić in his concluding remarks said the goal will require solving challenging problems along the way but believes Abstract Wikipedia and Wikilambda offer a solution that could be used to reach the goal. He said that the described project, though challenging, wouldn’t require a major breakthrough in current knowledge of natural language generation or lexical knowledge representation. While the concept is nascent, realising it would see a major boost to the available content across the different languages of Wikipedia.