Python Libraries for Data Analysis
NumPy Library
Table 3.3: NumPy library methods
Method
Array
Figure 3.10: Creating a list in Jupyter Notebook
Figure 3.11: Use of NumPy library
Pandas Library
Table 3.4: Differences between Pandas and NumPy libraries
Series Object
Attributes of a Series Object
Table 3.5: Attributes of a Series Object
Attribute
In computing, NaN stands for Not a Number.
Let's see some of these attributes of the Series object.
DataFrame Object
OS Library
Figure 3.14: OS library
This is your Excel file.
The dataset that you will use in this lesson is provided by the Ministry of Education through the Saudi Open Data Platform
Now, you are going to transform this Excel file to a DataFrame in order to manipulate its data.
Table 3.6: Attributes of a DataFrame Object
Attributes of a DataFrame Object
If the Excel file has multiple sheets, you can read a specific sheet.
Table 3.7: Pandas dtype mapping
Figure 3.17: Use of the attributes of a DataFrame Object
Indexing
Table 3.8: Indexing methods
Using Indexing in a Series Object
Figure 3.18: Using Indexing in a Series Object
Using Indexing in a DataFrame Object
Figure 3.19: Using Indexing in a DataFrame Object
Let's see some examples with the Series object.
Data filtering
Filtering Data or Subset Selection
Table 3.9: Boolean operators in Jupyter
Boolean Indexing
In this example, you will print the rows of the DataFrame that have a specific value in a specific column.
In the following example,you will use the loc() method to print the first five rows of two specific columns.
Indexing with Loc and Iloc Methods
Now, you will use the iloc() method to select all the elements from the 1st row of the DataFrame.
In this example, you will create a new DataFrame named studentsReg. This DataFrame will have two columns, one column for Region and another for Number of Students.
And in this example, you will use a for loop to print the first 10 rows of the 1st column of the studentsReg DataFrame.
Now for these examples, you will print specific elements of the DataFrame.
Table 3.10: Aggregate functions
In this example, you group the students according to their region and you calculate the sum of the students in each region.
Groupby Method
Grouping and Aggregating
Aggregate function
In this example you create a new w for Region, Number of students and Number of teachers. Then you group by the Region and calculate the sum of the students and the sum of the teachers in eac
In this example you group the students according to two criteria, their region and the level, and you calculate the sum of the students in each region.
Figure 3.31: Use of the df.duplicated() method
Duplicated Data
Figure 3.30: Data cleaning process
Table 3.11: Data cleaning methods
Data cleaning
عنوان جديد 1
After deleting the duplicates, you have to refresh your dataset to check that the duplicates have been removed.
Empty Cells
In this example you count the empty cells per column.
You can see the number of empty cells in each column.
Figure 3.35: Check for negative numbers
Wrong Data
Figure 3.34: Delete the rows containing missing values
Explain the importance of data cleaning before starting data analysis.
Describe the difference between data indexing and data filtering.
1 What is the difference between Series and DataFrame objects?
Use the dataset you imported in the previous exercise and:
Open a new Jupyter Notebook, import the Excel file with the name "tourist-indicators.xlsx".
Import the random library and use the random.randrange() function to print a random number between 1 and 100.
Open the sheet "I3" from the file "tourist-indicators.xlsx" and read it to a new DataFrame. Then: