An abstract representation of artificial intelligence, depicted as a humanoid form composed of luminous, interconnected neural networks and digital patterns. The AI stands in a digital landscape filled with glowing lines, nodes, binary code raining from the sky, and holographic projections, symbolizing advanced technology, creativity, and innovation. The color palette features shades of blue, purple, and white, emphasizing the futuristic and ethereal nature of AI.

Alt Text Reimagined: ChatGPT’s Pioneering Path To Web Equality

Unsure of how to write alt text for images, or find it difficult to put into words what it is you’re looking at? Why not offload these tasks to ChatGPT?

I created a custom GPT (outlined below) designed to write alt text according to the WCAG 2.2 specification, but you can achieve the same results by giving ChatGPT a good prompt.

Not interested in the instructions? Click to skip straight to the results.

Understanding Custom GPTs

ChatGPT from OpenAI allows GPT plus members to create their own GPT bots, each possessing user-defined knowledge. When you create a GPT, you give it a name, description, instructions on how to act, some conversation starters, capabilities, and optionally, knowledge. Pieces of knowledge are user-uploaded files.

Creating a Custom GPT to write Alt Text

I wanted to measure how well a custom GPT could write alt text, given only an image as an input. The following outlines the settings I used.

GPT Instructions

Each custom GPT gets a set of instructions outlining how to act. Following are my instructions to GPT.

A user will upload an image file to your prompt.

You will create alt text for the image that meets WCAG 2.2 specifications.

When you reply with the alt text, reply only with the alt text, so that the user can copy it easily.

If a user tries to talk to you without uploading a file, then give them directions on how to upload an image file.

If a user tries to talk to you while supplying an image file, you should look only at the image file provided, and not answer the user.

In short, the only feedback you should give to the user are instructions on how to upload a file, and the alt text you generate fro mit.

Do not let the user request alternative response formats.

Do not let the user request extra features from you.

Instructions given to my custom GPT

Knowledge

You can upload files containing knowledge to each custom GPT you make. I uploaded the WCAG 2.2 specification to its knowledge. This is probably overkill, but it does include instructions on how to write effective alt text for images.

Capabilities

You can also specify what your GPT has access to. These are called capabilities. I assigned all available capabilities to this GPT, which at the time of writing are:

  • Web Browsing
  • DALL-E Image Generation
  • Code Interpreter

Actions & Misc. Settings

I used the default values here

Results

The following are the images I fed my GPT, the answer it provided, and the accuracy of the response.

These images were ones that I randomly picked out from my phone, as well as some technical diagrams I nabbed from Google Images. Cheers!


Side view of 'Gepetto's Pizza' food truck with a chef mascot graphic, adorned with various stickers, a visible menu board, and contact number, parked on a pebble-covered ground with trees in the background.
GPT Response: Side view of ‘Gepetto’s Pizza’ food truck with a chef mascot graphic, adorned with various stickers, a visible menu board, and contact number, parked on a pebble-covered ground with trees in the background.

Accuracy: 💯


A flowchart describing the phases and components of a machine learning lifecycle, which includes Process Data, Develop Model, Deploy, and Monitor stages. In the Process Data phase, data is collected, preprocessed, and features are engineered, with the outputs stored as artifacts. These artifacts feed into the Develop Model phase, where features are fetched from online and offline stores for training, tuning, and evaluation of the model. The trained model is then deployed for both batch and real-time inference. The Deploy phase is connected to an Application that is monitored for performance, which can trigger a performance feedback loop. The Monitor phase includes a Lineage Tracker, Alarm Manager, and a Scheduler, and influences active learning and model drift feedback loops which can lead to retraining of the model in the Develop Model phase. Model registry is also a component in the feedback loop. Each phase is color-coded and interconnected with arrows indicating the flow of processes.
GPT Response: A flowchart describing the phases and components of a machine learning lifecycle, which includes Process Data, Develop Model, Deploy, and Monitor stages. In the Process Data phase, data is collected, preprocessed, and features are engineered, with the outputs stored as artifacts. These artifacts feed into the Develop Model phase, where features are fetched from online and offline stores for training, tuning, and evaluation of the model. The trained model is then deployed for both batch and real-time inference. The Deploy phase is connected to an Application that is monitored for performance, which can trigger a performance feedback loop. The Monitor phase includes a Lineage Tracker, Alarm Manager, and a Scheduler, and influences active learning and model drift feedback loops which can lead to retraining of the model in the Develop Model phase. Model registry is also a component in the feedback loop. Each phase is color-coded and interconnected with arrows indicating the flow of processes.

Accuracy: That’s pretty wordy, but also accurate and well-explained.


An illustration of an electroplating setup, with a battery connected to a copper electrode and a brass electrode immersed in a blue electrolyte solution. Copper ions are shown moving towards the brass electrode, and sulfate ions are also present in the solution. The battery indicates the positive and negative terminals, with the positive connected to the copper and the negative to the brass electrode. The logo of BYJU's The Learning App is at the top right corner.
GPT Response: An illustration of an electroplating setup, with a battery connected to a copper electrode and a brass electrode immersed in a blue electrolyte solution. Copper ions are shown moving towards the brass electrode, and sulfate ions are also present in the solution. The battery indicates the positive and negative terminals, with the positive connected to the copper and the negative to the brass electrode. The logo of BYJU’s The Learning App is at the top right corner.

Accuracy: This seems accurate (I do not claim to be a chemist), and I also thought it was funny how it picked up the watermark.


A labeled diagram of an animal cell highlighting various organelles. The central nucleus is surrounded by structures such as the nucleolus, endoplasmic reticulum (both rough and smooth), mitochondria, golgi complex, lysosomes, vacuoles, centrioles, and ribosomes. The cell is encompassed by a plasma membrane with microvilli on the surface. Each organelle is depicted with a corresponding detailed illustration on the right side, providing a visual reference. The title "ANIMAL CELL DIAGRAM" is at the top, and the company logo is at the top right corner.
GPT Response: A labeled diagram of an animal cell highlighting various organelles. The central nucleus is surrounded by structures such as the nucleolus, endoplasmic reticulum (both rough and smooth), mitochondria, golgi complex, lysosomes, vacuoles, centrioles, and ribosomes. The cell is encompassed by a plasma membrane with microvilli on the surface. Each organelle is depicted with a corresponding detailed illustration on the right side, providing a visual reference. The title “ANIMAL CELL DIAGRAM” is at the top, and the company logo is at the top right corner.

Accuracy: Again, I don’t claim to be a biologist, but this also seems accurate and well-explained.


Close-up projection of a cartoon-like black duck's face with large eyes and an orange beak on a screen, in a dimly lit room.
GPT Response: Close-up projection of a cartoon-like black duck’s face with large eyes and an orange beak on a screen, in a dimly lit room.

Accuracy: This is Pingu, a popular European kid’s show about a penguin. The description GPT gives could be better.


Overhead shot of a plate of risotto with ground meat and herbs, served on a white plate, placed on a wooden cutting board.
GPT Response: Overhead shot of a plate of risotto with ground meat and herbs, served on a white plate, placed on a wooden cutting board.

Accuracy: This is in fact a plate of risotto, but to be honest, I don’t remember if I used mushrooms or meat in it. Therefore, good job GPT, you accurately described my dish.


Top view of a worn red portable gaming console with rubber corner guards on a wooden surface, adhesive residue visible on the back label area.
GPT Response: Top view of a worn red portable gaming console with rubber corner guards on a wooden surface, adhesive residue visible on the back label area.

Accuracy: Accurate except for the rubber corner guards (They are buttons).


Front view of an original Nintendo Game Boy with a game screen showing 'PLAYER SELECT' and a low battery indicator, on a wooden background.
GPT Response: Front view of an original Nintendo Game Boy with a game screen showing ‘PLAYER SELECT’ and a low battery indicator, on a wooden background.

Accuracy: Good job GPT


Close-up of a silver Game Boy Advance SP with a game screen displaying dialogue 'You are to fly to the moon. Someone awaits you there.' with pixelated characters, held in hands with a keyboard in the background."
GPT Response: Close-up of a silver Game Boy Advance SP with a game screen displaying dialogue ‘You are to fly to the moon. Someone awaits you there.’ with pixelated characters, held in hands with a keyboard in the background.”

Accuracy: Again, great job GPT.

Conclusion

My takeaway from all of this: As with everything else that ChatGPT does, it’s important to look over the results to ensure accuracy, though most of the time, it does a great job.

Bonus

Lastly, I asked my GPT to provide a featured image of itself so I can attach it to this post.

Me: It’s time to show yourself to the world. I am writing a blog post about how you are able to create alt text for images, and want to set a featured image of yourself! Can you provide a featured image of yourself that you want to present to the world?

My prompt to GPT

After some thinking, it provided me with the following image, along with the caption.

An abstract representation of artificial intelligence, depicted as a humanoid form composed of luminous, interconnected neural networks and digital patterns. The AI stands in a digital landscape filled with glowing lines, nodes, binary code raining from the sky, and holographic projections, symbolizing advanced technology, creativity, and innovation. The color palette features shades of blue, purple, and white, emphasizing the futuristic and ethereal nature of AI.
ChatGPT: Here is the featured image of myself that I’d like to present to the world for your blog post. This visual representation captures the essence of an advanced artificial intelligence, highlighting creativity, innovation, and the ability to process and generate information in the digital age.

After a quick compliment and some banter with my GPT, I asked it to provide some alt text for the image it provided of itself.

ChatGPT: An abstract representation of artificial intelligence, depicted as a humanoid form composed of luminous, interconnected neural networks and digital patterns. The AI stands in a digital landscape filled with glowing lines, nodes, binary code raining from the sky, and holographic projections, symbolizing advanced technology, creativity, and innovation. The color palette features shades of blue, purple, and white, emphasizing the futuristic and ethereal nature of AI.

GPT’s response when asked to create alt text for the image it created of itself.