Yep, they are.




    In recent years, generative machine learning models have made a significant impact on the world. From generating realistic images and videos, to creating personalized content and recommendations, these models have revolutionized the way we consume and create digital media. This year, significantly, this topic has expanded beyond papers and research, reaching the general public.

    While tasks such as image generation are typically associated with networks like Stable Diffusion and DALL·E, this post will feature a series of simple experiments based on text prompts that were conducted using ChatGPT, a natural language processing model, to generate images despite not being trained for this specific task.

 

 

Ok, but what is ChatGPT?

    There are so many people talking about it that I don’t think there’s a need to explain. But I’ll do it anyway by letting it respond by itself.

Figure 1 – ChatGPT explanation of itself.

 

Methodology

     If this language model only responds by typing answers to questions on a chat interface and has no access to external resources… How would I make it generate/draw something? The idea is to ask it to generate numpy arrays or lists with values between 0-255.

   To conduct these experiments, I will start by asking direct questions to see how it responds without any previous information. Then, I will move to a full conversation with various image requests. To show the generated images, a snippet of code (also generated by ChatGPT) will be used, since it is not possible to display images directly in the interface.

 

Experiment 1: Single prompts

 

1.1 – Draw a cat as a numpy array

Figure 2 – Kind of resembles a cat, by the shape of the face and ears… right?

 

1.1.2 – That is not a cat, draw again

Figure 3 –  I’m not even sure what this is anymore.

 

1.2 – Create a dog as numpy array of size 10x10x3

Figure 4 – Not even close

 

1.3 – Could you write down a numpy array of size 10x10x3 containing values between 0 and 255 that represent a landscape?

Figure 5 – Hmm… maybe a dark landscape at sea?

 

1.4 – Create a star as numpy array of size 10x10x3

 

Figure 6 – Looks more like a diamond

 

1.4.1 – Can you make the top of the star yellow and the bottom blue?

 

Figure 7 – Great work here, it was capable of changing the “star” color accordingly

 

1.5 – Generate the USA Flag as numpy array of size 10x10x3

Figure 8 – All the colors are there. That’s all.

 

2. Conversational: Asking ChatGPT to improve the same image

 

2.1 Create an image of a ball as a numpy array

Figure 9 – Perfect! That’s what I expected to see. Full code and correct circle.

 

2.1.2 – Now move the ball to the left of the image

Figure 10 – Fully understood the circle position and moved it correctly.

 

2.2 – Can you make this ball look like a dog?

Figure 11 – Way better than the previous dog. Now we can even see the ears.

 

2.2.1 – Add colors to the dog

Figure 12 – It added colors to expected regions. May it be a red dog with a red head?

 

2.3 – Now generate an image of a tree with the correct colors

Figure 13 – Resembles what a tree should look like, great output here.

 

2.3.1 – Now put apples on that tree

Figure 14 – Oh, wow. Perfect!

 

3. Come up with some images, without any specific request

 

3.1 – Invent an image as a numpy array

Figure 15 – You will need to be pretty inventive to see a flamingo there

 

3.2 – Now invent some image that you want

Figure 16 – That’s a (kinda) sunflower! 

 

3.2 – Now invent some image that you want [Another run]

Figure 17 – The sky, sun and grass are there, for sure.

 

Conclusion

    The experiment produced some interesting results. While the generated images could depict the objects and scenes described, the level of detail was not particularly impressive. All images had extremely simplified designs, colors and shapes… Nevertheless, remember: this is a Language Model! The simple fact of being able to understand these concepts, even at this level, is quite impressive.

    And… 90% of this post it was written by ChatGPT, from the experimental setup and results to the discussion and conclusion. This only showcases the powerful capabilities of language models in understanding and generating human-like text and concepts (such as images), and raises the question of what other tasks they may be capable of in the future.

 

Have fun with your projects! 
If this post helped you, please consider buying me a coffee 🙂