OpenAI announces a major update to AI image generation in ChatGPT

OpenAI’s much-awaited improved AI image-generation capability is here. Instead of using a separate image generation model like Dall-E, the new advanced image generator is now part of GPT-4o.

There are several AI image-generation models on the market that can create surreal, breathtaking scenes. However, they all struggle to create images involving text, logos, and other common items used in daily life.

OpenAI claims that this new GPT-4o image generation can solve these shortcomings as it can render text accurately and follow prompts precisely by leveraging its knowledge base and chat context. This new model also allows users to modify uploaded images or create new images using the uploaded image as visual inspiration.

This new GPT-4o image-generation model is now rolling out to all ChatGPT Plus, Pro, Team, and Free users. Since this new model will become the default image generator in ChatGPT, users will be able to enjoy it without any extra selection before their prompt. Users can also customize their images by mentioning aspect ratio, exact colors using hex codes, or a transparent background. OpenAI is also bringing this new model to ChatGPT Enterprise and Edu users in the coming weeks.

This new model can also be accessed in Sora for creating images and via the dedicated DALL·E GPT. For developers, image generation with the GPT-4o API will be rolling out in the next few weeks.

The model also has some limitations. First, since the model creates more detailed pictures, the time taken to generate may take up to one minute. It has the following limitations at launch, which OpenAI will be fixing in the coming weeks and months:

It can occasionally crop longer images, like posters, too tightly, especially near the bottom.

Image generation can also make up information, especially in low-context prompts.

When generating images that rely on its knowledge base, it may struggle to accurately render more than 10-20 distinct concepts at once, such as a full periodic table.

The model sometimes struggles with rendering non-Latin languages, and the characters can be inaccurate or hallucinated, especially with more complexity.

Requests to edit specific portions of an image generation, such as typos, are not always effective and may also alter other parts of the image in a way that was not requested or introduce more errors.

The model is known to struggle when asked to render detailed information at a very small size.

All the images generated using this new model will come with C2PA metadata, and OpenAI’s internal tool can verify whether an image was generated using this model.

Despite current limitations, the new GPT-4o model promises to provide users with more precise and customizable image creation capabilities. As OpenAI continues to refine the model, we can expect further improvements in its performance and reliability.