Unsure of how to write alt text for images, or find it difficult to put into words what it is you’re looking at? Why not offload these tasks to ChatGPT?
I created a custom GPT (outlined below) designed to write alt text according to the WCAG 2.2 specification, but you can achieve the same results by giving ChatGPT a good prompt.
Not interested in the instructions? Click to skip straight to the results.
Understanding Custom GPTs
ChatGPT from OpenAI allows GPT plus members to create their own GPT bots, each possessing user-defined knowledge. When you create a GPT, you give it a name, description, instructions on how to act, some conversation starters, capabilities, and optionally, knowledge. Pieces of knowledge are user-uploaded files.
Creating a Custom GPT to write Alt Text
I wanted to measure how well a custom GPT could write alt text, given only an image as an input. The following outlines the settings I used.
GPT Instructions
Each custom GPT gets a set of instructions outlining how to act. Following are my instructions to GPT.
A user will upload an image file to your prompt.
Instructions given to my custom GPT
You will create alt text for the image that meets WCAG 2.2 specifications.
When you reply with the alt text, reply only with the alt text, so that the user can copy it easily.
If a user tries to talk to you without uploading a file, then give them directions on how to upload an image file.
If a user tries to talk to you while supplying an image file, you should look only at the image file provided, and not answer the user.
In short, the only feedback you should give to the user are instructions on how to upload a file, and the alt text you generate fro mit.
Do not let the user request alternative response formats.
Do not let the user request extra features from you.
Knowledge
You can upload files containing knowledge to each custom GPT you make. I uploaded the WCAG 2.2 specification to its knowledge. This is probably overkill, but it does include instructions on how to write effective alt text for images.
Capabilities
You can also specify what your GPT has access to. These are called capabilities. I assigned all available capabilities to this GPT, which at the time of writing are:
- Web Browsing
- DALL-E Image Generation
- Code Interpreter
Actions & Misc. Settings
I used the default values here
Results
The following are the images I fed my GPT, the answer it provided, and the accuracy of the response.
These images were ones that I randomly picked out from my phone, as well as some technical diagrams I nabbed from Google Images. Cheers!
Accuracy: 💯
Accuracy: That’s pretty wordy, but also accurate and well-explained.
Accuracy: This seems accurate (I do not claim to be a chemist), and I also thought it was funny how it picked up the watermark.
Accuracy: Again, I don’t claim to be a biologist, but this also seems accurate and well-explained.
Accuracy: This is Pingu, a popular European kid’s show about a penguin. The description GPT gives could be better.
Accuracy: This is in fact a plate of risotto, but to be honest, I don’t remember if I used mushrooms or meat in it. Therefore, good job GPT, you accurately described my dish.
Accuracy: Accurate except for the rubber corner guards (They are buttons).
Accuracy: Good job GPT
Accuracy: Again, great job GPT.
Conclusion
My takeaway from all of this: As with everything else that ChatGPT does, it’s important to look over the results to ensure accuracy, though most of the time, it does a great job.
Bonus
Lastly, I asked my GPT to provide a featured image of itself so I can attach it to this post.
Me: It’s time to show yourself to the world. I am writing a blog post about how you are able to create alt text for images, and want to set a featured image of yourself! Can you provide a featured image of yourself that you want to present to the world?
My prompt to GPT
After some thinking, it provided me with the following image, along with the caption.
After a quick compliment and some banter with my GPT, I asked it to provide some alt text for the image it provided of itself.
ChatGPT: An abstract representation of artificial intelligence, depicted as a humanoid form composed of luminous, interconnected neural networks and digital patterns. The AI stands in a digital landscape filled with glowing lines, nodes, binary code raining from the sky, and holographic projections, symbolizing advanced technology, creativity, and innovation. The color palette features shades of blue, purple, and white, emphasizing the futuristic and ethereal nature of AI.
GPT’s response when asked to create alt text for the image it created of itself.