Lesson Unsupervised Learning for Image Analysis - Artificial Intelligence - ثالث ثانوي

Lesson 2 Unsupervised Learning for Image Analysis Link to digital lesson www.ien.edu.sa Understanding Image Content In the context of computer vision, unsupervised learning has been used for a variety of tasks, such as image segmentation, video segmentation, and anomaly detection. Another key application of unsupervised learning is the image search, which involves searching a large database of images to find those that are similar to a given query image. The first step towards building a search engine for image data is defining a similarity function that can evaluate the similarity between two images based on their visual properties, such as their border, texture, or shape. Once the user submits a new image as a query, the search engine goes over all the images in the available database, finds those with the highest similarity score, and returns them to the user. An alternative approach is to use the similarity function to separate the images into clusters, so that each cluster consists of images that are visually similar to each other. Each cluster is then represented by a centroid: an image that sits at the center of the cluster and has the smallest overall distance (i.e. difference) from the other cluster members. Once the user submits a new image as a query, the search engine will go over all the clusters and select the one whose centroid is the most similar to the query image. The members of the selected cluster are then returned to the user. Figure 4.16 shows an example of this approach: Anomaly Detection Anomaly detection is a process used to identify abnormal or unexpected patterns, events, or data points within a dataset. Its aim is to uncover unusual cases that stand out from the norm and may warrant further investigation. Image Segmentation Image segmentation is a process of dividing an image into multiple segments or regions that share common visual properties. Its aim is to partition an image into meaningful and coherent parts that can be used for further analysis. وزارة التعليم Ministry of Education 220 2024-1446 Vehicle type: car Direction: passing Vehicle type car Vehicle type: car Direction: passing Vehich type: van Direction: passing Figure 4.16: Autonomous vehicle vision with image segmentation Vehicle type: car Direction: passing

Lesson 2 Unsupervised Learning for Image Analysis

Understanding Image Content

Anomaly Detection

Image Segmentation

Figure 4.16: Autonomous vehicle vision with image segmentation

cluster 1 cluster 2 cluster 3 Θ 90% Figure 4.17: Clusters of image recognition analysis 50% 40% In this example shown in figure 4.17, the query image has a similarity of 40%, 50%, and 90% with the centroids of the three image clusters, respectively. Similarity is assumed to be a percentage between 0% and 100%. Cluster 2 has the highest score, as it includes cats of the same species and color as the query image. The scores of clusters 1 and 3 are close to each other (40% and 50%), as the two clusters are similar to the query in different ways. Cluster 1 includes cats with a significantly different color pattern. On the other hand, even though cluster 3 represents a different species of animal (tiger), the color pattern is similar to that of the query image. The process of clustering visual data is similar to that of clustering numeric or textual data. However, the unique nature of visual data requires specialized methods for evaluating visual similarity. Even though early methods relied on hand-crafted features, recent advances in deep learning have led to the development of powerful models that can automatically learn sophisticated features from unlabeled visual data. This lesson uses an image-clustering task to demonstrate how using more sophisticated features can lead to significantly better results. Specifically, the lesson will cover three different approaches: . • Flattening and clustering the original data, without any feature engineering. ● Transforming the data using the HOG feature descriptor (introduced in the previous lesson) and then clustering the transformed data. • Using a neural network model to cluster the original data without any feature engineering. The LHI-Animal-Faces dataset that was used in the previous lesson will also be used to evaluate the various image clustering techniques. This dataset was originally designed for classification tasks and therefore includes the true label (the actual animal type) for each image. In this lesson, these labels ⚫will only be used for validation and will not be used to cluster the images. An effective clustering approach should be able to group images with the same label in the same cluster and separate images pill with different labels into different clusters. Ministry of Education 2024-1446 221

Lesson 2 Unsupervised Learning for Image Analysis

Figure 4.17: Clusters of image recognition analysis

Loading and Preprocessing Images The following code imports the libraries that will be used to load and preprocess the images: %%capture import matplotlib.pyplot as plt from os import listdiry !pip install scikit-image from skimage.io import imread from skimage.transform import resize from skimage import img_as_ubyte # a palette of 10 colors that will be used to visualize the clusters = color palette ['blue', 'green', 'red', 'yellow', 'gray','purple', 'orange', 'pink', 'black', 'brown'] The following function reads the images of the LHI-Animal-Faces dataset from their input_folder and resizes each of them to the same width and height dimensions. It extends the resize_images() function from the previous lesson by allowing the user to specify a list of animal classes that should be considered. It also uses a single line of Python code to read, resize, and store each image: def resize_images_v2(input_folder:str, width:int, height:int, labels_to_keep: list ): labels = [ ] # a list with the label for each image resized_images = [] # a list of resized images in np array format filenames = [] # a list of the original image file names for subfolder in listdir(input_folder): print(subfolder) path = input_folder + '/' + subfolder. for file in listdir(path): label subfolder[:-4] # uses the subfolder name without the "Head" suffix if label not in labels_to_keep: continue labels.append(label) #appends the label #loads, resizes, preprocesses, and stores the image resized_images.append(img_as_ubyte(resize(imread(path+ '/'+file), (width, height)))) filenames.append(file) return resized_images, labels,filenames وزارة التعليم Ministry of Education 222 2024-1446

Lesson 2 Unsupervised Learning for Image Analysis

Loading and Preprocessing Images

Unstructured data is diverse and can require a lot of time and computational resources. This is especially true when they are processed via complex deep learning techniques, as will be done later in this lesson. Therefore, in order to reduce computational time, the resize_images_v2() function is applied to a subset of images from animal classes: resized_images, labels, filenames-resize_images_v2( "Animal Face/Image", width = 224, height = 224, labels_to_keep=['Lion', 'Chicken', 'Duck', 'Rabbit', 'Deer', 'Cat', 'Wolf', 'Bear', 'Pigeon', 'Eagle'] BearHead CatHead ) ChickenHead CowHead DeerHead DuckHead EagleHead ElephantHead LionHead MonkeyHead Natural These 10 are the PandaHead PigeonHead labels that are RabbitHead going to be used SheepHead TigerHead WolfHead You can easily change the "labels_to_keep" parameter to focus on particular classes. You will also notice that the width and height of the images are now set to 224 × 224, rather than the 100 × 100 shape that was used in the previous lesson. This is done because one of the deep-learning clustering methods that is presented in this lesson requires the images to have these dimensions. The 224 × 224 shape is therefore adopted in order to ensure that all methods are given access to the same input. As also mentioned in the previous lesson, the original lists (resized_images, labels, filenames) include the images from each class packed together. For instance, all the "Lion" images appear together at the beginning of the 'resized' list. This can be misleading for many algorithms, especially in the computer vision domain. While this can be addressed by randomly shuffling each of the three lists, it is important to ensure that the same random order is used for all three of them. Otherwise, it is impossible to find the correct label or filename for a specific image. In the previous lesson, shuffling was taken care of by the train_test_split() function. However, given that this function is not applicable for clustering tasks, the following code is used for shuffling: import random # connects the three lists together, so that they are shuffled in the same order connected = list(zip(resized_images, labels, filenames)) random.shuffle (connected) #disconnects the three lists resized_images, labels, filenames= zip(*connected) وزارة التعليم Ministry of Education 2024-1446 223

Lesson 2 Unsupervised Learning for Image Analysis

Unstructured data is diverse and can require a lot of time and computational resources.

The next step is to convert the 'resized_images' and 'labels' lists to numpy arrays. Similarly to the previous lesson, the standard (X,Y) variable names are used to represent data and labels: import numpy as np # used for numeric computations X = np.array(resized_images) y = np.array(labels) X.shape (1085, 224, 224, 3) The shape of the data verifies that it includes 1,085 images, each with dimensions of 224 × 224 and 3 RGB channels. Clustering without Feature Engineering The first clustering attempt will focus on simply flattening the images to convert each of them to a one-dimensional vector with 224 × 224 × 3 = 150,528 values. Similar to the classification algorithms that were explored in the previous lesson, most clustering algorithms also require this type of vectorized format. X_flat np.array([img.flatten() for img in X]) = X_flat[0].shape (150528,) X_flat [0] # prints the first flat image array([107, 146, 102, ..., 91, 86, 108], dtype=uint8) Each numeric value in this flat format is an RGB value between 0 and 255. As also seen in the previous lesson, standard scaling and normalization can sometimes improve the results of some machine learning algorithms. The following code can be used to normalize the values and bring them between 0 and 1. X_norm X flat / 255. = X_norm [0] array([0.41960784, 0.57254902, 0.4 0.35686275, 0.3372549 ' 0.42352941]) وزارة التعليم Ministry of Education 224 2024-1446

Lesson 2 Unsupervised Learning for Image Analysis

The next step is to convert the 'resized_images' and 'labels' lists to numpy arrays.

Clustering without Feature Engineering

The data can now be visualized using the familiar TSNEVisualizer tool from the yellowbrick library. This tool was also used in unit 3 lesson 2 to visualize the clusters in text data. %%capture !pip install yellowbrick from yellowbrick.text import TSNEVisualizer tsne = TSNEVisualizer(colors = color_palette) #initializes the tool tsne.fit(X_norm, y) #uses TSNE to reduce the data to 2 dimensions tsne.show(); TSNE Projection of 1085 Documents Bear Cat Chicken Deer Duck Eagle Lion Pigeon Rabbit Wolf Figure 4.18: Cluster visualization This preliminary visualization is not promising. The various animal classes seem to be scrambled together, without clear separation and no obvious clusters. This indicates that simply flattening the original image data is unlikely to lead to high quality results. Next, the same agglomerative clustering algorithm that was used in unit 3 lesson 2 is also used to cluster the data in X_norm. The following code imports the set of required tools and visualizes the dendrogram of the dataset وزارة التعليم Ministry of Education 2024-1446 225

Lesson 2 Unsupervised Learning for Image Analysis

The data can now be visualized using the familiar TSNEVisualizer tool from the yellowbrick library.

from sklearn.cluster import AgglomerativeClustering # used for agglomerative clustering import scipy.cluster.hierarchy as hierarchy hierarchy.set_link_color_palette (color_palette) # sets the color palette plt.figure() # iteratively merges points and clusters until all points belong to a single cluster linkage_flat = hierarchy. linkage (X_norm, method = 'ward') hierarchy.linkage(X_norm, hierarchy.dendrogram (linkage_flat) plt.show() 'ward' is a linkage method used in hierarchical 1600 1400 1200 1000 800 600 400 200 agglomerative clustering. Figure 4.19: Dendrogram categorizing the data into two clusters The dendrogram reveals two large clusters that can be further broken down into smaller ones. The following code uses the Agglomerative Clustering tool to create 10 clusters, which is the actual number of clusters in the data: AC = Agglomerative Clustering (linkage = 'ward',n_clusters AC.fit(X_norm) # applies the tool to the data pred = AC.labels # gets the cluster labels pred array([9, 6, 3, 4, 4, 3], dtype=int64) = 10) Finally, the homogeneity, completeness, and adjusted Rand metrics (all introduced in unit 3 lesson 2) pul ill are used to evaluate the quality of the produced clusters: Ministry of Education 226 2024-1446

Lesson 2 Unsupervised Learning for Image Analysis

Figure 4.19: Dendrogram categorizing data into two clusters

from sklearn.metrics import homogeneity_score, adjusted_rand_score, completeness_score print('\nHomogeneity score:', homogeneity_score(y, pred)) print('\nAdjusted Rand score:', adjusted_rand_score(y, pred)) print('\nCompleteness score:', completeness_score(y, pred)) Homogeneity score: 0.09868725008128477 Adjusted Rand score: 0.038254515908926826 Completeness score: 0.101897123096584 As described in detail in unit 3 lesson 2, the homogeneity and completeness scores take values between O and 1. The first is maximized when all the points of each cluster have the same ground truth label. The second one is maximized when all the data points with the same ground truth label also belong to the same cluster. Finally, the adjusted Rand score takes values between -0.5 and 1.0 and is maximized when all the data points with the same label are in the same cluster and all points with different labels are in different clusters. As expected following the visualization of the data, the algorithm fails to find high-quality clusters that match the actual animal classes. The values for all three metrics are very low. This demonstrates that, even though simply flattening the data was sufficient to get reasonable results for image classification, image clustering is a significantly harder problem. Clustering with Feature Selection The previous lesson demonstrated how the HOG transformation can be used to convert image data into a more informative format that led to significantly higher performance for image classification. Next, the same transformation is applied to test whether it can also improve the results of image clustering tasks. from skimage.color import rgb2gray from skimage.feature import hog # converts the list of resized images to an array of grayscale images X_gray = np.array([rgb2gray(img) for img in resized_images]) # computes the HOG features for each grayscale image in the array X_hog np.array([hog (img) for img in X_gray]) = X_hog.shape (1085, 54756) The shape of the transformed data reveals that each image is now represented as a vector of 54,756 numeric values. The following code uses the TSNEVisualizer tool to visualize this new format: tsne TSNEVisualizer (colors = color palette) = tsne.fit(X_hog, y) tsne.show(); وزارة التعليم Ministry of Education 2024-1446 227

Lesson 2 Unsupervised Learning for Image Analysis

Finally, the homogeneity, completeness, and adjusted Rand metrics

Clustering with Feature Selection

TSNE Projection of 1085 Documents Bear Cat Chicken Deer Duck Eagle Lion Pigeon Rabbit Wolf Figure 4.20: Cluster visualization The visualization is much more promising than the one produced for the non-transformed data. Even though some impurities exist, the figure reveals clear and generally well-separated clusters. The dendrogram of this more promising dataset can now be computed: plt.figure() = linkage 2 hierarchy.linkage (X_hog, method = 'ward') hierarchy.dendrogram (linkage_2) plt.show() 100 90 80 وزارة التعليم 60 Ministry of Education 228 2024-1446 40 40 20 Figure 4.21: Dendrogram of the various animal face categories with HOG

Lesson 2 Unsupervised Learning for Image Analysis

Figure 4.20: Clusters visualization

The dendrogram suggests 5 clusters, exactly half of the correct number of 10. The following code adopts this suggestion, applies the AgglomerativeClustering tool, and reports the results for the three metrics: AC = Agglomerative Clustering (linkage AC.fit(X_hog) = pred AC.labels_ = 'ward', n_clusters = 5) print('\nHomogeneity score:', homogeneity_score(y, pred)) print('\nAdjusted Rand score:', adjusted_rand_score(y, pred)) print('\nCompleteness score:', completeness_score (y, pred)) Homogeneity score: 0.4046340612330986 Adjusted Rand score: 0.29990205334627734 Completeness score: 0.6306921317302154 The results reveal that, even though the number of clusters that was used was significantly lower than the correct one, the results are far superior to those delivered when using the correct number on the non-transformed data. This demonstrates the intelligence of the HOG transformation and validates that it can lead to impressive performance improvements for both supervised and unsupervised learning tasks in computer vision. To complete the analysis, the following code re-clusters the transformed data with the correct number of clusters: AC = Agglomerative Clustering (linkage AC.fit(X_hog) = pred AC.labels_ = 'ward', n_clusters = 10) print('\nHomogeneity score:', homogeneity_score (y, pred)) print('\nAdjusted Rand score:', adjusted_rand_score(y, pred)) print('\nCompleteness score:', completeness_score (y, pred)) Homogeneity score: 0.5720932612704411 Adjusted Rand score: 0.41243540297103065 Completeness score: 0.617016965322667 As expected, the scores have increased overall. For instance, both homogeneity and completeness are now above 0.55, indicating that the algorithm does a better job both of placing animals from the same class in the same cluster and of creating "pure" clusters that mostly consist of the same animal class. وزارة التعليم Ministry of Education 2024-1446 229

Lesson 2 Unsupervised Learning for Image Analysis

The dendrogram suggests 5 clusters,

Clustering Using Neural Networks The use of deep learning models (deep neural networks with multiple layers) has revolutionized the field of image clustering by providing powerful and highly accurate algorithms that can automatically group similar images together without the need for feature engineering. Many traditional image clustering methods rely on feature extractors to extract meaningful information from an image and use this information to group similar images together. This process can be time-consuming and requires domain expertise to design effective feature extractors. In addition, as seen in the previous lesson, even though feature descriptors such as the HOG transformation can indeed improve results, they are far from perfect and there is certainly room for improvement. Deep learning, on the other hand, has the ability to learn feature representations from the raw data automatically. This allows deep learning methods to learn highly discriminative features that capture the underlying patterns in the data, resulting in more accurate and robust clustering. To achieve this, several different layers are used in a neural network including: • Dense layers • Pooling layers Dropout layers In the neural network of unit 3 lesson 1, a 300-neuron hidden layer of the Word2Vec model was used to represent each word. In that case, the Word2Vec model was pre-trained on a very large dataset with millions of stories from Google News. Pre-trained neural network models are also popular in the field of computer vision. A characteristic example is the VGG16 model, which is commonly used for image recognition tasks. VGG16 follows a deep CNN-based architecture with 16 layers. VGG16 is a supervised model that was trained on a large dataset of labeled images, called ImageNet. However, the training dataset for the VGG16 consists of millions of images and hundreds of different labels. This significantly improves the model's ability to understand the different parts of an image. Similar to the simple CNN shown in figure 4.22, VGG16 also uses a final dense layer with 4,096 neurons to represent each image, before feeding it to the output layer. This section demonstrates how VGG16 can be adapted for image clustering, even though it was originally designed for image classification: ① Load the pre-trained VGG16 model. Dense layer A layer in neural networks where the signals are passed from the nodes in the previous layer in the network to the nodes in the current layer with a specific weight, and an activation function is applied to the signals sent to the dense layer to generate the final output results. Pooling layer A layer in neural networks used to reduce the spatial dimensions of the input data. Dropout layer A regularization technique used to prevent overspecialization of a model to a dataset in neural networks by randomly dropping out nodes in the layer during each training iteration. 2. Remove the output layer of the model. This leaves the final dense layer as the new output layer. 3 Use the truncated model to map each of the images in the Animal Faces dataset to a numeric vector with 4,096 values. العالية التعليم 4 Use Agglomerative Clustering to cluster the produced vectors. Ministry of Education 230 2024-1446

Lesson 2 Unsupervised Learning for Image Analysis

Clustering Using Neural Networks

وزارة التعليم Ministry of Education 2024-1446 input- Conv 1-1 Conv 1-1 Pooling Conv 2-1 Conv 2-2 Pooling Conv 3-1 Conv 3-2 Conv 3-3 Pooling Conv 4-1 Conv 4-2 Conv 4-3 Pooling Conv 5-1 Conv 5-2 Conv 5-3 Pooling Dense Dense Dense Figure 4.22: VGG16 architecture The TensorFlow and Keras libraries that were introduced in the previous lesson can be used to access and truncate the VGG16 model. The first step is to import all the required tools: from keras.applications.vgg16 import VGG16 # used to access the pre-trained VGG16 model from keras.models import Model model = VGG16() # loads the pretrained VGG16 model # removes the output layer model = Model(inputs = model.inputs, outputs = model.layers[-2].output) The following code applies some basic preprocessing required by VGG16, such as scaling the RGB values to be between 0 and 1: from keras.applications.vgg16 import preprocess_input X_prep = preprocess_input(X) X_prep.shape Remove the last layer from the output. (1085, 224, 224, 3) Note that the shape of the data remains the same: 1,085 images, each with dimensions of 224 × 224 and 3 RGB channels. Next, the truncated model can be used to map each image to a vector of 4,096 numbers: X_VGG16 = model.predict(X_prep, use_multiprocessing X_VGG16.shape = True) 34/34 [ (1085, 4096) II II || II II II II II || II II II II =] - 57s 2s/step The multiprocessing=True parameter is set to speed up the process by computing the vectors for multiple images in parallel. Before proceeding with the clustering step, the following code is used to visualize the vectorized data: tsne TSNEVisualizer (colors = color_palette) • = tsne.fit(X_VGG16, labels) tsne.show(); 231 → output

Lesson 2 Unsupervised Learning for Image Analysis

The TensorFlow and Keras libraries that were introduced in the previous lesson can be used to access

TSNE Projection of 1085 Documents Bear Cat Chicken Deer Duck Eagle Lion Pigeon Rabbit Wolf Figure 4.23: Clusters visualization The results are impressive. The new visualization reveals clearly separated, near perfect clusters. The separation is also significantly better than that in the HOG-transformed data. = linkage 3 hierarchy.linkage (X_VGG16, method = 'ward') plt.figure() hierarchy.dendrogram (linkage_3) plt.show() 1200 1000 800 600 400 200 Figure 4.24: Dendrogram of the various animal face categories with VGG16 The dendrogram suggests 4 clusters. In this case, the practitioner can easily ignore this suggestion pill and instead follow the visualization above which clearly indicates the existence of 10 clusters. Ministry of Education 232 2024-1446

Lesson 2 Unsupervised Learning for Image Analysis

The results are impressive. The new visualization reveals clearly separated,

The following code uses Agglomerative Clustering and reports the metric scores for both 4 and 10 clusters: AC = Agglomerative Clustering (linkage = 'ward',n_clusters = 4) AC.fit(X_VGG16) pred AC. labels_ print('\nHomogeneity score:', homogeneity_score(y, pred)) print('\nAdjusted Rand score:', adjusted_rand_score (y, pred)) print('\nCompleteness score:', completeness_score(y, pred)) Homogeneity score: 0.504687456015823 Adjusted Rand score: 0.37265351562538257 Completeness score: 0.9193141240200559 AC Agglomerative Clustering (linkage= 'ward',n_clusters = AC.fit(X_VGG16) pred AC. labels_ = 10) print('\nHomogeneity score:', homogeneity_score(y, pred)) print('\nAdjusted Rand score:', adjusted_rand_score(y, pred)) print('\nCompleteness score:', completeness_score(y, pred)) Homogeneity score: 0.8403973102506642 Adjusted Rand score: 0.766734821176714 Completeness score: 0.8509145102288217 The results validate the evidence provided by the visualization. The transformations produced by VGG16 lead to vastly superior results for both 4 and 10 clusters. In fact, near-perfect scores for all three metrics were reported when using 10 clusters, verifying that the produced results are almost perfectly aligned with the animal classes in the dataset. VGG16 is one of the earliest highly intelligent pre-trained CNN models for computer vision applications. However, many other intelligent pre-trained CNN models have been published and surpassed the performance of the VGG16 model. وزارة التعليم Ministry of Education 2024-1446 233

Lesson 2 Unsupervised Learning for Image Analysis

The following code uses Agglomerative Clustering and reports the metric scores for both 4 and 10 clusters:

1 Exercises Mention an advantage that unsupervised vision techniques have over supervised techniques. 2 You are given a numpy array, X_flat, that includes flattened images. Each row in the array represents a different flattened image as a sequence of integers between 0 and 255. Complete the following code so that it uses Agglomerative Clustering to group the images from X_flat into 5 different clusters. from import AgglomerativeClustering # used for agglomerative clustering AC Agglomerative Clustering (linkage= 'ward', = X_norm = # normalizes the data AC.fit(X_norm) # applies the tool to the data pred = AC. # gets the cluster labels 3 List some advantages of using Deep Learning over other traditional image clustering methods? وزارة التعليم Ministry of Education 234 2024-1446

Lesson 2 Unsupervised Learning for Image Analysis

Mention an advantage that unsupervised vision techniques have over supervised techniques.

You are given a numpy array X_flat that includes flattened images. Each row in the array

List some advantages of using Deep Learning over other traditional image clustering methods?

4 You are given a numpy array, X_flat, that includes flattened images. Each row in the array represents a different flattened image as a sequence of integers between 0 and 255. Complete the following code so that it uses the ward method to create and visualize the dendrogram of the images in this array. import scipy.cluster.hierarchy as hierarchy #visualizes and supports hierarchical clustering tasks import X_norm = 5 LO as plt # normalizes the data plt.figure() # creates a new empty figure linkage_flat-hierarchy.linkage( hierarchy. plt.show() #shows the figure (linkage_flat) method=' ') Describe how clustering with neural networks is applied in image analysis. وزارة التعليم Ministry of Education 2024-1446 235

Lesson 2 Unsupervised Learning for Image Analysis

You are given a numpy array X_flat that includes flattened images. Each row in the array

Describe how clustering with neural networks is applied in image analysis.