Microsoft's Copilot generative AI assistant is now part of a number of the company's software apps. That includes its Excel spreadsheet app, where users can type in text prompts to help with certain options.
However, a group of researchers at Microsoft have been working on a new AI large language model that was developed specifically for spreadsheet programs like Excel, Google Sheets, and others. Those Microsoft team members recently published their research paper on this new model, which has the fairly unimaginative name SpreadsheetLLM, on the Arxiv.org site (via VentureBeat).
In the paper, the researchers note that spreadsheets include layouts and formatting that have a lot of different forms and options. The researchers claim this can result in some issues for standard AI LLM in terms of their token limitations along with understanding spreadsheet-specific features like cell addresses and formats.
The team says that their SpreadsheetLLM was designed to try to overcome these challenges. In addition, the team developed what it called SheetCompressor, which as the name suggests, actually compresses spreadsheets so that it can be used more effectively by SpreadsheetLLM.
The paper states:
It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4’s in-context learning setting.
In their experiments, the Microsoft researchers were able to offer much better results with larger spreadsheets while at the same time cutting the costs down in terms of tokens by as much as 96 percent.
There's no word on when or even if Microsoft plans to make SpreadsheetLLM available to the general public. the paper does note there are still some limitations to this model, including if a spreadsheet uses any background color and borders because they could take up too many tokens. Also SheetCompressor currently cannot compress cells that include natural language. The paper stated:
For example, categorizing terms like "China," "America," and "France" under a unified label such as "Country" could not only increase the compression ratio but also deepen the semantic understanding of the data by LLMs.
It will be interesting to see if Microsoft can turn this research into an actual product.
4 Comments - Add comment