Lesson Generating Visual Data - Artificial Intelligence - ثالث ثانوي
Part 1
1. Basics of Artificial Intelligence
2. Artificial Intelligence Algorithms
3. Natural Language Processing (NPL)
Part 2
4. Image Recognition
5. Optimization & Decision-making Algorithms
Lesson 3 Generating Visual Data Link to digital lesson www.ien.edu.sa Using Al to Generate Images While the computer vision algorithms described in the previous two lessons of this unit focused on understanding the different aspects of a given image, the field of image generation in this lesson focuses on creating new images. The field of image. generation has a long history, dating back to the 1950s & 1960s, when researchers first began experimenting with mathematical equations to create images. Today, the field has grown to encompass a wide range of techniques. One of the earliest and most well-known techniques for image generation is the use of fractals. A fractal is a geometric shape or pattern that is self-similar, meaning that it looks the same at different zoom scales. The most famous fractal is the Mandelbrot set, which can be seen in figure 4.25. Figure 4.25: Mandelbrot fractal In the late 20th century, researchers began to explore more advanced techniques for image generation, such as neural networks. One of the most popular techniques for image generation with neural networks is text-to-image synthesis. This technique involves training a neural network to generate images from textual descriptions. The neural network is trained on a dataset of images and their associated text descriptions. The network learns to associate certain words or phrases with specific features of an image, such as the shape or color of an object. Once trained, the network can be used to generate new images from text descriptions. This technique has been used to generate a wide range of images, from simple objects to complex scenes. Another technique for image generation is image-to-image synthesis. This technique involves training a neural network on a dataset of images to learn to recognize the unique features of an image, in order to generate new images that are similar to the existing one, but with variations. Recently, researchers have been exploring text-guided image-to-image synthesis, which combines the strengths of text-to-image and image-to-image synthesis methods by allowing the user to guide the synthesis process using text prompts. This technique has been used to generate high-quality images that are consistent with a given text prompt while also being visually similar to an initial image. Finally, another state-of-the-art technique is text-guided image-inpainting, which focuses on filling in missing or corrupted parts of an image based on a given text description. The text description provides information about what the missing or corrupted parts of the image should look like, and the goal of the inpainting algorithm is to use this information to generate a realistic and coherent image. This lesson provides practical examples for text-to-image, text-guided image-to-image, and text-guided image-inpainting generation. وزارة التعليم Ministry of Education 236 2024-1446
Using AI to Generate Images
Image Generation and Computational Resources Image generation is a computationally intensive task, as it involves the use of complex algorithms that require large amounts of processing power. These algorithms typically involve the treating of large amounts of data, such as 3D models, textures, and lighting information, which can also contribute to the computational demands of the task. Graphics Processing Unit (GPU) A GPU is a specialized type of processor that is designed to handle the large number of mathematical operations required for rendering images and video. One of the key technologies that is used to accelerate image generation is the use of Graphics Processing Units (GPUs). Unlike a traditional Central Processing Unit (CPU), which is designed to handle a wide range of tasks, a GPU is optimized for the types of mathematical operations required for image rendering and other graphics-related tasks. This makes them much more efficient at handling large amounts of data and performing complex calculations, which is why they are often used in image generation and other computationally intensive tasks. This lesson demonstrates how you can utilize the popular Google Colab platform to get access to a powerful GPU-based infrastructure at no cost, using only a standard google account. Google Colab is a free cloud-based platform that allows users to write and execute code, run experiments, and train models in a Jupyter Notebook environment. To access Google Colab: > Go to https://colab.research.google.com 1 > Sign in with your Google account. 2 > Click on Edit > Notebook settings. > Choose GPU 4 and click Save. 5 Welcome To Colaboratory-Cole x + https://colab.research.google.com Welcome To Colaboratory 1 File Edit View Insert Runtime Tools Help 2 Share ✡ Sign in Connect Editing 74 Undo Ctrl+MZ ext * Copy to Drive Table o Redo a Getting Select all cells Ctrl+Shift+A ne to Colab! Data s (x) Machi Cut cell or selection Copy cell or selection More F D Paste ady familiar with Colab, check out this video to learn about interactive tables, the de history view, and the command palette. Fe Delete selected cells Ctrl+M D 3 Cool Google Se Find and replace Ctrl+H Colab Features Find next Ctrl+G Find previous Notebook settings Clear all outputs Ctrl+Shift+G 3 Notebook settings وزارة التعليم Ministry of Education 2024-1446 Hardware accelerator What is Colab None None Colab, or "Colaboratory ⚫ Zero configuration tput when saving this notebook GPU TPU 4 5 Cancel Save Figure 4.26: Accessing Google Colab X 237
Image Generation and Computational Resources
To use Python Notebook: > Click on File > New notebook. 1 > Click Files 2 and inside the adjacent area that unfolds drag and drop the images you will be using in the lesson. > You can now type your python code inside the code cell 4 and run it by clicking the button beside the code cell. 5 CO Welcome To Colaboratory File Edit View Insert Runtime Tools Help New notebook 1 Ta Open notebook a (x) [ Upload notebook Rename Save a copy in Drive ☐ Code + Text Copy to Drive Ctrl+0 Untitled1.ipynb✰ File Edit View Insert Runtime Tools Help All changes saved Files Save a copy as a GitHub Gist a Save a copy in GitHub Save Revision history Ctr {x} sample_data 2 Download Print Ctr 3 + Code + Text □ × 4 5 The Google Colab environment works similarly to Jupyter Notebook. Below is the classic "Hello World" example: !!! Ő Untitled1.ipynb File Edit View Insert Runtime Tools Help All changes + Code + Text {x} print("hello world") hello world The image generation algorithms described in this chapter are designed to be creative, and are thus not deterministic. This means that they are not guaranteed to generate the exact same image for the same input. The generated images included in this chapter are thus just examples of the possible images that can be generated by the code. Figure 4.27: Using a Python Notebook. Diffusion Models and Generative Adversarial Networks In recent years, the field of image generation has seen significant progress, with the development of various techniques and models that can generate realistic and high-quality images from different sources of information. Two of the most popular and widely used techniques for image generation are Generative Adversarial Networks (GANs) and Stable Diffusion. are Generative Adv ⚫ In this section, you will be introduced to the main concepts and techniques behind GANS and Stable Diffusion and provide an overview of their applications in image generation. Furthermore their pill similarities and differences will be discussed and the pros and cons of each approach. Ministry of Education 238 2024-1446
To use Python Notebook:
Diffusion Models and Generative Adversarial Networks
Generating Images with Generative Adversarial Networks (GANs) GANS are a class of generative models that consist of two main components: a generator and a discriminator. The generator generates fake images, while the discriminator tries to distinguish the generated images from real images. The two components are trained in an adversarial way, where the generator tries to "trick" the discriminator, and the discriminator tries to become better at detecting fake images. One of the main advantages of GANS is that they can generate high-quality and realistic images that are difficult to distinguish from real images. However, GANs also have some limitations, such as non- convergence, which means generator and discriminator networks do not improve over time, and mode collapse in outputs, which means often repeating the same or similar outputs, regardless of the input noise or data. The generator and the discriminator in GANs are typically implemented using Convolutional Neural Networks (CNNs) or a similar architecture. Real Images Discriminator Random Noise Generator Fake Images Generating Images with Stable Diffusion Loss Figure 4.28: GAN architecture Predicted Labels Stable Diffusion is a deep learning model for text-to-image generation. The method consists of two main components: a text encoder and a visual decoder. The text encoder and visual decoder are trained together on a dataset of paired text and image data, where each text input is associated with one or more corresponding images. The text encoder is a neural network that takes in text input (such as a sentence or a paragraph) and maps it to an embedding: a numeric vector with a fixed number of values. This embedding representation captures the meaning of the input text. A similar approach is used by the Word2Vec and SBERT models that were covered in unit 3 and which generate embeddings for individual words and sentences, respectively. The text embedding created by the encoder is then passed through the visual decoder to generate an image. The visual decoder is also a type of neural network and is typically implemented using a CNN or a similar architecture. The generated image is compared with the corresponding real image from the dataset, and the difference between them is used to compute the loss. The loss is then used to update the parameters of the text encoder and visual decoder to minimize the difference between the generated images and the real images. Table 4.4: Stable Diffusion training process 1. Pass the text input through the text encoder to get the text embedding. 2. Pass the text embedding through the visual decoder to generate an image. 3. Compute the loss (difference) between the generated image and the corresponding real image. 4. Use the loss to update the parameters of the text encoder and visual decoder. At a high level, this includes rewarding the neurons that helped reduce the loss and "punishing" the ⚫ neurons that contributed to its increase. 5. Repeat the above steps for multiple text-image pairs in the dataset. اونية التعليم Ministry of Education 2024-1446 239
Generating Images with Generative Adversarial Networks (GANs)
Generating Images with Stable Diffusion
Table 4.4: Stable Diffusion training process
التعليم Both GANS and Stable Diffusion models have delivered impressive results in the field of image generation. The remainder of this lesson focuses on providing practical Python examples for the diffusion-based approach, which is currently considered the state-of-the-art. As described before, image generation is a computationally intensive task. It is therefore strongly encouraged that you run all Python examples on the Google Colab platform or a different GPU-powered infrastructure that you may have access to. This chapter utilizes the "diffusers" library, which is currently considered the best open-source library for diffusion-based models. The following code installs the library, as well as some additional required libraries: %%capture !pip install diffusers !pip install transformers !pip install accelerate import matplotlib.pyplot as plt from PIL import Image # used to represent images Text-to-Image Generation This section demonstrates how the diffusers library can be used to generate images based on text prompt provided by the user. The examples in this section utilize "stable-diffusion-v1-4", a popular pretrained model for text-to-image generation. # a tool used to generate images using stable diffusion from diffusers import Diffusion Pipeline generator = Diffusion Pipeline.from_pretrained("CompVis/stable-diffusion-v1-4") # specifies what GPUs should be used for this generation generator.to("cuda") image generator("A photo of a white lion in the jungle.").images [0] plt.imshow(image); Ministry of EduFigure 4.29: Generated image of a white lion in the jungle 240 2024-1446 The model responds to the prompt "A photo of a white lion in the jungle" with an impressive and very realistic image, as shown in figure 4.29. Experimenting with creative prompts is the best way to gain experience and understand the capabilities and limitations of this approach. INFORMATION CUDA (Compute Unified Device Architecture) is a parallel computing platform that enables the use of GPUs.
Both GANs and Stable Diffusion models have delivered impressive results in the field
Text-to-Image Generation
The following prompt adds an additional dimension to the generation process, by asking for a white lion painted in the specific style of Pablo Picasso, one of the most famous artists of the twentieth- century. image = generator("A painting of a white lion in the style of Picasso."). images[0] plt.imshow(image); Again, the results are impressive and demonstrate the creativity of the stable diffusion process. The produced image is indeed that of a white lion. However, contrary to the previous prompt, the new prompt leads to painting-like rather than photo-like images. In addition, the painting's style is indeed. remarkably similar to that used by Pablo Picasso. Image-to-Image Generation with Text Guidance The next example uses the diffusers library to generate an image based on two inputs: an existing image, which serves as the basis for the new image that will be generated and a text prompt that describes what the produced image should look like. While the text-to-image task demonstrated in the previous section was only limited by a text prompt, this new task has to ensure that the new image is both similar to the original and an accurate visual of the description given in the text prompt. Figure 4.30: Generated image of a lion in Picasso style #pipeline used for image to image generation with stable diffusion from diffusers import StableDiffusionImg2ImgPipeline # loads a pretrained generator model generator = StableDiffusionImg2Img Pipeline.from_pretrained ("runwayml/stable- diffusion-v1-5") # moves the generator model to the GPU (CUDA) for faster processing generator.to("cuda") init_image = Image.open("landscape.jpg") init_image.thumbnail ((768, 768)) #resizes the image to prepare it as input for the model plt.imshow(init_image); وزارة التعليم Ministry of Education 2024-1446 241
Image-to-Image Generation with Text Guidance
The example in figure 4.31 uses the pretrained model "stable-diffusion-v1-5", which is appropriate for image-to-image generation with text guidance. # a detailed prompt describing the desired visual # for the produced image prompt = "A realistic mountain landscape with a large castle." image generator (prompt-prompt, image init_image, strength=0.75). images[0] plt.imshow(image); Figure 4.31: Original landscape image Figure 4.32: Generated landscape image with strength=0.75 The model indeed generates an image that is both faithful to the text prompt and visually similar to the original image. The "strength" parameter is used to control the visual difference between the original and new images. The parameter takes values between 0 and 1, with higher values allowing the model to be more flexible and less constrained by the original image. For example, the following code uses the exact same prompt with a strength=1. # generate a new image based on the prompt and the # initial image using the generator model image image = = generator(prompt-prompt, init_image, strength-1). images [0] plt.imshow(image); Figure 4.33: Generated landscape image with strength=1 The resulting image in figure 4.33 verifies that increasing the value of the strength parameter leads to a visual that fits even better with the guidance offered by the text prompt, but is significantly less similar to the input image. Another characteristic example is shown below. Its output is shown on figure 4.34. init_image = Image.open("cat_1.jpg") init image.thumbnail ((768, 768)) plt.imshow(init_image); وزارة التعليم Ministry of Education 242 2024-1446 Figure 4.34: Original cat image
The example on figure 4.31 uses the pretrained model "stable-diffusion-v1-5", which is appropriate for image-to-image generation with text guidance.
The following code will now be used to convert this to a photo of a tiger: prompt image = "A photo of a tiger" = generator(prompt prompt, image-init_image, strength=0.5).images[0] plt.imshow(image); The first attempt is bound by the value of the strength parameter, leading to a picture that appears to be a mix between a tiger and the cat from the original photo, as shown in figure 4.35. The new picture indicates that the algorithm did not have enough "strength" to properly convert the face of the cat to that of a tiger. The background remains highly similar to that of the original image. Next, the strength parameter is increased to allow the model to move further away from the original image and closer to the text prompt: image = = generator(prompt=prompt, image init_image, strength=0.75). images[0] plt.imshow(image); Indeed, the new image displayed is a tiger. However, notice how the surroundings, posture and angles of the animal remains very similar to the original. This demonstrates that the model is still aware of the original image and tried to maintain elements that did not have to be changed to get closer to the text prompt. وزارة التعليم Ministry of Education 2024-1446 Figure 4.35: Generated tiger image with strength=0.5 Figure 4.36: Generated tiger image with strength=0.75 243
The following code will now be used to convert this to a photo of a tiger:
Text-Guided Image-Inpainting The next example focuses on using stable diffusion to replace specific parts of a given image with a new visual described by a text prompt. The "stable-diffusion-inpainting" pretrained model is used for this purpose. The following code loads the image of a cat on a bench and a "mask" isolates the specific parts of the image that are covered by the cat. #tool used for text-guided image in-painting from diffusers import Stable Diffusion InpaintPipeline init_image = Image.open("cat_on_bench.png").resize((512, 512)) plt.imshow(init_image); mask_image = Image.open("cat_mask.jpg").resize((512, 512)) plt.imshow(mask_image); Figure 4.37: Original cat image Figure 4.38: Cat image mask The mask is a simple black and white image that has the exact same dimensions as the original. The parts that are replaced in the new image are highlighted in white, while every other part of the mask is black. Next, the pretrained model is loaded and a prompt is created to replace the cat in the original picture with an astronaut, as you can see in figure 4.39. generator = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable- diffusion-inpainting") generator = generator.to("cuda") "A photo of an astronaut" image = generator(prompt-prompt, image-init_image, mask_image=mask_image). images [0] plt.imshow(image); فرازة التعليم Ministry of Education 244 2024-1446
Text-Guided Image-Inpainting
The new image successfully replaces the cat from the original image with a very realistic visual of an astronaut. In addition, this visual blends smoothly with the background elements and lighting of the image. In fact, even a simpler, less accurate mask is sufficient to produce a realistic replacement. Consider the following input image and mask: Figure 4.39: Generated astronaut image init_image Image.open("desk.jpg").resize((512, 512)). = plt.imshow(init_image); mask_image = Image.open("desk_mask.jpg").resize((512, 512)) plt.imshow(mask_image); Figure 4.40: Original desk image Figure 4.41: Desk image mask In this example, the mask covers the laptop in the middle of the image. The following prompt and code are then used to replace the laptop with a photo of book: prompt = "A photo of a book" image = generator (prompt prompt, image=init_image, mask_image=mask_image). images[0] plt.imshow(image); Despite the fact that the prompt asked for the introduction of an object (book) that was significantly different from the one that was being replaced (laptop), the model did a good job of blending shapes and colors to create an accurate visual. With the continued advancement of machine learning and ●●●computer graphics technologies, it is likely that even more impressive and realistic images will be generated in the future. وزارة التعليم Ministry of Education 2024-1446 Figure 4.42: Generated desk image with book 245
The new image successfully replaces the cat from the
Exercises 1 Give a brief description of text-guided image inpainting. 2 Describe the training process for Stable Diffusion models. وزارة التعليم Ministry of Education 246 2024-1446
Give a brief description of text-guided image inpainting.
Describe the training process for Stable Diffusion models.
3 Describe the generator and discriminator components in Generative Adversarial Networks. 4 Use the Diffusion Pipeline tool from the diffusers library to create a photo of your favorite animal eating your favorite food. Use the Google Colab platform for this task. 5 Use the Stable DiffusionImg2ImgPipeline tool from the diffusers library to transform the animal in the photo from the previous exercise to a different animal of your choice. Use the Google Colab platform for this task. وزارة التعليم Ministry of Education 2024-1446 247