Lesson Supervised Learning - Artificial Intelligence - ثالث ثانوي

Lesson 1 Supervised Learning

Natural Language Processing (NPL)

Learning Objectives

Tools

Lesson 1 Supervised Learning

Using Supervised Learning to Understand Text

Machine Learning

Lesson 1 Supervised Learning

Machine learning can be broadly categorized into three main types:

Table 3.1: Advantages and disadvantages of Machine Learning types

Lesson 1 Supervised Learning

Supervised Learning

Regression

Classification

Lesson 1 Supervised Learning

# load the train and testing data.

Lesson 1 Supervised Learning

Data Preparation and Pre-Processing

Sklearn Library

CountVectorizer

Lesson 1 Supervised Learning

# expand the sparse data into a sparse matrix format, where each column represents a different word.

Lesson 1 Supervised Learning

As expected, the sparse format requires far less memory,

Build a Prediction Pipeline

Lesson 1 Supervised Learning

The pipeline correctly predicts a positive and negative label

Lesson 1 Supervised Learning

The confusion matrix contains the counts of actual vs.

Lesson 1 Supervised Learning

Explaining Black-Box Predictors

LIME (Local Interpretable Model-Agnostic Explanations) LIME is a method for explaining the predictions made by black-box

Lesson 1 Supervised Learning

As expected, the predictor delivers a very confident negative prediction for this easy example.

Lesson 1 Supervised Learning

A negative coefficient increases the probability of the negative class,

Lesson 1 Supervised Learning

# get the correct labels of this example.

Lesson 1 Supervised Learning

Improving Text Vectorization

Detecting Phrases

Lesson 1 Supervised Learning

\w matches all alphanumeric characters (a-z, A-Z, 0-9) and the underscore character.

Lesson 1 Supervised Learning

When applied to the two tokenized sentence examples shown above, this phrase model produces the following results:

Lesson 1 Supervised Learning

# an example of an annotated document from the imdb training data

Using TF-IDF for Text Vectorization

Lesson 1 Supervised Learning

This new vectorizer can now be input to the same Naive Bayes Classifier to build a new predictive pipeline and apply it to the IMDb testing data:

Lesson 1 Supervised Learning

The new pipeline confidently predicts the correct positive label for this review. The following code uses the LIME explainer to explain the logic behind this prediction:

Lesson 1 Supervised Learning

Read the sentences and tick True or False.

Explain the reason the dense matrix format requires more space in the memory than the sparse format.

Analyze how the two mathematical factors in TD-IDF are utilized to inspect the importance of a word in a document.

Lesson 1 Supervised Learning

You are given a numPy array X_train_text that includes one document in each row.

Complete the following code so that it builds LimeTextExplainer for the prediction