A Microsoft AI engineer claims he found security guardrail issues in OpenAI's DALL-E 3

Last week, explicit images of singer Taylor Swift flooded the X (formerly Twitter) social network. As a result X temporarily blocked searches for Swift on its platform. A report claims Microsoft"s Designer AI image creator, which uses OpenAI"s DALL-E 3 model, was used to make the Swift deep fake images. Microsoft has officially said it has found no evidence to support this claim, but it has since updated Designer"s safety guardrails.

Now a current Microsoft AI engineer named Shane Jones has sent a letter to Washington State"s Attorney General Bob Ferguson, along with US senators and representatives, claiming that he discovered a flaw in DALL-E 3 that bypassed its security systems. He further claims Microsoft tried to downplay the flaw.

In his letter, as posted by GeekWire, Jones claims he found the guardrail flaws in DALL-E 3 in early December. He did not go into detail on the specific issues. He claimed that the flaws were so severe that DALL-3 "posed a public safety risk" and should be shut down while OpenAI tried to fix the flaws.

Jones claims he sent his concerns to Microsoft in early December, but was then asked to send what he found to OpenAI. He says he did not receive a response, and later posted an open letter on LinkedIn to OpenAI"s board of directors, asking them to shut down DALL-E 3. He claims Microsoft"s legal team contacted him to take down that letter, which he says he did. Since then, Jones claims he has not heard from Microsoft or OpenAI on this issue.

Microsoft has sent a statement about Jones"s claims to GeekWire. The company says it "confirmed that the techniques he shared did not bypass our safety filters in any of our AI-powered image generation solutions." It added that it is "connecting with this colleague to address any remaining concerns he may have."

In its own statement, an OpenAI spokesperson says that "the technique he shared does not bypass our safety systems." It added:

We’ve also implemented additional safeguards for our products, ChatGPT and the DALL-E API – including declining requests that ask for a public figure by name. We identify and refuse messages that violate our policies and filter all generated images before they are shown to the user. We use external expert red teaming to test for misuse and strengthen our safeguards.

Jones" letter says he wants the US government to create a new way for people to report and track any AI-related issues. He says it should be set up so that companies that develop AI products can use this reporting system without any fear of issues from those businesses.

Tags