2: Python Libraries for Data Analysis

Python Libraries for Data Analysis

NumPy Library

Table 3.3: NumPy library methods

Method

Array

Figure 3.10: Creating a list in Jupyter Notebook

Figure 3.11: Use of NumPy library

2: Python Libraries for Data Analysis

Pandas Library

Table 3.4: Differences between Pandas and NumPy libraries

Series Object

2: Python Libraries for Data Analysis

Attributes of a Series Object

Table 3.5: Attributes of a Series Object

Attribute

In computing, NaN stands for Not a Number.

Let's see some of these attributes of the Series object.

2: Python Libraries for Data Analysis

DataFrame Object

OS Library

Figure 3.14: OS library

This is your Excel file.

The dataset that you will use in this lesson is provided by the Ministry of Education through the Saudi Open Data Platform

2: Python Libraries for Data Analysis

Now, you are going to transform this Excel file to a DataFrame in order to manipulate its data.

Table 3.6: Attributes of a DataFrame Object

Attributes of a DataFrame Object

If the Excel file has multiple sheets, you can read a specific sheet.

2: Python Libraries for Data Analysis

Table 3.7: Pandas dtype mapping

Figure 3.17: Use of the attributes of a DataFrame Object

2: Python Libraries for Data Analysis

Indexing

Table 3.8: Indexing methods

Using Indexing in a Series Object

2: Python Libraries for Data Analysis

Figure 3.18: Using Indexing in a Series Object

2: Python Libraries for Data Analysis

Using Indexing in a DataFrame Object

2: Python Libraries for Data Analysis

Figure 3.19: Using Indexing in a DataFrame Object

2: Python Libraries for Data Analysis

Let's see some examples with the Series object.

Data filtering

Filtering Data or Subset Selection

Table 3.9: Boolean operators in Jupyter

Boolean Indexing

2: Python Libraries for Data Analysis

In this example, you will print the rows of the DataFrame that have a specific value in a specific column.

In the following example,you will use the loc() method to print the first five rows of two specific columns.

Indexing with Loc and Iloc Methods

2: Python Libraries for Data Analysis

Now, you will use the iloc() method to select all the elements from the 1st row of the DataFrame.

In this example, you will create a new DataFrame named studentsReg. This DataFrame will have two columns, one column for Region and another for Number of Students.

2: Python Libraries for Data Analysis

And in this example, you will use a for loop to print the first 10 rows of the 1st column of the studentsReg DataFrame.

Now for these examples, you will print specific elements of the DataFrame.

2: Python Libraries for Data Analysis

Table 3.10: Aggregate functions

In this example, you group the students according to their region and you calculate the sum of the students in each region.

Groupby Method

Grouping and Aggregating

Aggregate function

2: Python Libraries for Data Analysis

In this example you create a new w for Region, Number of students and Number of teachers. Then you group by the Region and calculate the sum of the students and the sum of the teachers in eac

In this example you group the students according to two criteria, their region and the level, and you calculate the sum of the students in each region.

2: Python Libraries for Data Analysis

Figure 3.31: Use of the df.duplicated() method

Duplicated Data

Figure 3.30: Data cleaning process

Table 3.11: Data cleaning methods

Data cleaning

عنوان جديد 1

2: Python Libraries for Data Analysis

After deleting the duplicates, you have to refresh your dataset to check that the duplicates have been removed.

Empty Cells

In this example you count the empty cells per column.

You can see the number of empty cells in each column.

2: Python Libraries for Data Analysis

Figure 3.35: Check for negative numbers

Wrong Data

Figure 3.34: Delete the rows containing missing values

2: Python Libraries for Data Analysis

Explain the importance of data cleaning before starting data analysis.

حل Explain the importance of data cleaning before starting data analysis.

Describe the difference between data indexing and data filtering.

حل Describe the difference between data indexing and data filtering.

1 What is the difference between Series and DataFrame objects?

حل 1 What is the difference between Series and DataFrame objects?
2: Python Libraries for Data Analysis

Use the dataset you imported in the previous exercise and:

حل Use the dataset you imported in the previous exercise and:

Open a new Jupyter Notebook, import the Excel file with the name "tourist-indicators.xlsx".

حل Open a new Jupyter Notebook, import the Excel file with the name "tourist-indicators.xlsx".

Import the random library and use the random.randrange() function to print a random number between 1 and 100.

حل Import the random library and use the random.randrange() function to print a random number between 1 and 100.
2: Python Libraries for Data Analysis

Open the sheet "I3" from the file "tourist-indicators.xlsx" and read it to a new DataFrame. Then:

حل Open the sheet "I3" from the file "tourist-indicators.xlsx" and read it to a new DataFrame. Then: