Lesson Rule based Decision Making - Artificial Intelligence - ثالث ثانوي
Part 1
1. Basics of Artificial Intelligence
2. Artificial Intelligence Algorithms
3. Natural Language Processing (NPL)
Part 2
4. Image Recognition
5. Optimization & Decision-making Algorithms
Lesson 3 Rule-based Decision Making Link to digital lesson www.ien.edu.sa Rule-Based Systems Rule-based Al systems focus on using a set of predefined rules to make decisions and solve problems. Expert systems are the most well-known example of rule-based Al. They were one of the first forms of Artificial Intelligence ever developed and were particularly popular in the 1980s and 1990s. They were often used to automate tasks that would normally require human expertise, such as diagnosing medical conditions or troubleshooting technical problems. Nowadays, rule-based systems are no longer considered state-of-the-art and are often outperformed by more modern Al approaches. However, they maintain their popularity in many application domains due to their ability to combine reasonable performance with an intuitive and interpretable decision-making process. Knowledge Base One of the key components of any rule-based Al system is the knowledge base, which is a collection of facts and rules that the system uses to make decisions. These facts and rules are typically entered into the system by human experts, who are responsible for identifying the most important information and defining the rules that the system should follow. To make a decision or solve a problem, the expert system begins by examining the facts and rules in its knowledge base and applying them to the situation at hand. If the system is unable to find a match between the facts and rules in its knowledge base, it may ask the user for additional information or refer the problem to a human expert for further assistance. Some of the main advantages and disadvantages of rule-based systems are shown in table 2.5: Expert Systems An expert system is a type of Al that mimics the decision-making ability of a human expert. It uses a knowledge base of rules and facts and inference engines to provide advice or solve problems in a specific domain of knowledge. Table 2.5: Main advantages and disadvantages of rule-based systems Advantages Disadvantages • They can make decisions and solve problems more quickly and accurately than humans, especially when it comes to tasks that require a large amount of knowledge or data. • They are able to operate consistently, without the biases or errors that can sometimes influence human decision-making. • They are only as good as the knowledge and rules that have been entered into their knowledge base, and they may not be able to handle situations that are outside of their area of expertise. • They are not able to learn or adapt in the same way that humans can and this makes them less applicable to dynamic scenarios where both the input data and logic can change significantly with time. وزارة التعليم Ministry of Education 2024-1446 89
Rule-Based Systems
Knowledge Base
Expert systems
Table 2.5: Main advantages and disadvantages of rule-based systems
In this lesson, you will be introduced to rule-based systems in the context of one of their key applications: medical diagnosis. The system will provide a medical diagnosis, based on the patient's symptoms, as seen in figure 2.8. Beginning with a simple rule-based diagnostic system, you will then discover some more intelligent systems and how each iteration leads to improved results. Iteration 1 In this first iteration, you will build a simple rule-based system that can diagnose three possible diseases: kidney stones, appendicitis, and food poisoning. The input to your system will be a simple knowledge base that maps each disease to a list of possible symptoms. This is provided in the format of a JSON file, which you load and display below. import json # a library used to save and load JSON files # the file with the symptom mapping symptom_mapping_file='symptom_mapping_v1.json' # open the mapping JSON file and load it into a dictionary with open(symptom_mapping_file) as f: mapping-json.load(f) #print the JSON file print(json.dumps(mapping, indent=2)) Rule-based Al System knowledge base Disease 1 Symptoms a Disease 2 b C Symptoms e f g h Disease 3 Symptoms i j k l Patient 1 Symptoms bdaj Patient 2 Symptoms fgek Disease Disease Disease 1 2 3 Diagnoses { "diseases" : { "food poisoning": [ ], "vomiting", "abdominal pain", "diarrhea", "fever" "kidney stones": [ ], "lower back pain", "vomiting", "fever" "appendicitis": [ }. وزارة التعليم Ministry of Education 90 2024-1446 "abdominal pain", "vomiting", "fever" Figure 2.8: Medical diagnosis by Rule-based Al System
In this lesson, you will be introduced to
Iteration 1
Figure 2.8: Medical diagnosis by Rule-based AI System
This first rule-based system will follow a simple rule: if the patient has at least 3 of all the possible symptoms of a disease, then the disease should be added as a possible diagnosis. Below you can find the Python function that uses this rule to make a diagnosis, given the above knowledge base and the patient's symptoms. def diagnose_v1(patient_symptoms: list): diagnosis=[] # the list of possible diseases if "vomiting" in patient_symptoms: if "abdominal pain" in patient_symptoms: if "diarrhea" in patient_symptoms: #1:vomiting, 2:abdominal pain, 3:diarrhea diagnosis.append('food poisoning') elif 'fever' in patient_symptoms: #1:vomiting, 2:abdominal pain, 3:fever diagnosis.append('food poisoning') diagnosis.append('appendicitis') elif "lower back pain" in patient_symptoms and 'fever' in patient_symptoms: #1:vomiting, 2:lower back pain, 3:fever diagnosis.append('kidney stones') elif "abdominal pain" in patient_symptoms and\ "diarrhea" in patient_symptoms and\ "fever" in patient_symptoms:\ #1:abdominal pain, 2:diarrhea, 3:fever diagnosis.append('food poisoning') return diagnosis In this case, the knowledge base is hard-coded inside the function in the form of IF statements. These statements utilize the common symptoms among the three diseases to gradually arrive at a diagnosis as quickly as possible. For instance, the "vomiting" symptom is shared by all diseases. Therefore, if the first IF statement is True, then 1 of the three required symptoms for all diseases has already been accounted for. Then, you will proceed to check for "abdominal pain", which is associated with two of the diseases, and continue in the same manner until all possible symptom combinations have been considered. وزارة التعليم Ministry of Education 2024-1446 91
This first rule-based system will follow a simple rule:
You can then test this function with three different patients: # Patient 1 my_symptoms = ['abdominal pain', 'fever', 'vomiting'] diagnosis-diagnose_v1(my_symptoms) print('Most likely diagnosis:', diagnosis) # Patient 2 my_symptoms = ['vomiting', 'lower back pain', 'fever' ] diagnosis-diagnose_v1(my_symptoms) print('Most likely diagnosis: ', diagnosis) # Patient 3 my_symptoms=['fever', 'cough', 'vomiting'] diagnosis-diagnose_v1(my_symptoms) print('Most likely diagnosis: ', diagnosis) Most likely diagnosis: ['food poisoning', 'appendicitis'] Most likely diagnosis: ['kidney stones'] Most likely diagnosis: () Patient 1 Symptoms • abdominal pain • fever • vomiting Patient 2 Symptoms vomiting • lower back pain ⚫ fever Rule-based Al System Diagnosis | symptom_mapping_v1.json Patient 3 Symptoms ⚫ fever • cough • vomiting Food poisoning or Appendicitis Kidney stones ? Figure 2.9: Representation of the first iteration For Patient 1, both food poisoning and appendicitis are included in the diagnosis because the patient's three symptoms are associated with both diseases. Patient 2 is diagnosed with kidney stones, which is the only disease that matches the 3 symptoms. Finally, a diagnosis cannot be made for Patient 3, as none of the three diseases have all the 3 of the patient's symptoms. The benefits of this first rule-based version are that it is intuitive and explainable. It is also guaranteed to consistently use its knowledge base and rules to provide a diagnosis, without bias or deviation from the standard line. However, this version also has significant disadvantages. First, the "at least 3 symptoms" rule is an oversimplified representation of how a human expert would actually make a medical diagnosis. Second, the knowledge base for this version is hard-coded in the function. Even though it was easy to create simple IF statements for such a small knowledge base, this task would become increasingly more complex and time-consuming for cases with many more diseases and .Symptoms التعليم Ministry of Education 92 ZU24-1446
You can then test this function with three different patients:
Iteration 2 In this second iteration, you will be enhancing the flexibility and applicability of your rule-based system by making it capable of dynamically reading the knowledge base directly from a JSON file. This will eradicate the process of manually engineering symptom-specific IF statements inside the function. This is a significant improvement that will make your system applicable to larger knowledge bases with arbitrary numbers of diseases and symptoms. An example of such a knowledge base can be found below. symptom_mapping_file='symptom_mapping_v2.json' with open(symptom_mapping_file) as f: mapping-json.load(f) print(json.dumps (mapping, indent=2)) { "diseases" : { "covid19": [ ], "fever", "headache" "tiredness", "sore throat", "cough" "common cold": [ "stuffy nose" "runny nose", "sneezing", "I 'sore throat", "cough" ], "flu": [ "fever", ], "headache", "tiredness", "stuffy nose", "sneezing", "sore throat", "cough", "runny nose" "allergies": [ "headache", "tiredness" "stuffy nose", "sneezing", "cough", "runny nose" ] } } This new knowledge base is only slightly larger than the previous one. However, it is clear that trying to manually create IF statements in this case would be significantly harder. For instance, the previous knowledge base had one disease with four symptoms and two diseases with three symptoms. Given the "at least 3 symptoms" rule that you applied in version 1, this led to 6 possible symptom triplets to consider. In the new knowledge base above, the four diseases have 5, 5, 8, and 6 symptoms. This leads to 96 possible triplets! In a case where you would have to deal with hundreds or even thousands of diseases, it would be impossible to create a system like the one in the first version. In addition, there is no valid medical reason for being limited to symptom triplets. Therefore, you will also make the diagnosis logic more versatile by counting the number of matching symptoms for each disease and allowing the user to specify the number of matching symptoms that a disease must have to be included in the diagnosis. if else iteration 1 iteration 2 for Figure 2.10: The second iteration has no hard-coded IF statements وزارة التعليم Ministry of Education 2024-1446 93
Iteration 2
def diagnose_v2(patient_symptoms:list, symptom_mapping_file: str, matching_symptoms_lower_bound: int): diagnosis [ ] with open(symptom_mapping_file) as f: mapping-json.load(f) # access the disease information disease_info=mapping['diseases'] # for every disease for disease in disease_info: counter=0 disease_symptoms-disease_info[disease] # for each patient symptom for symptom in patient_symptoms: # if this symptom is included in the known symptoms for the disease if symptom in disease_symptoms: counter+=1 if counter>-matching_symptoms_lower_bound: diagnosis.append(disease) return diagnosis This version has no hard-coded IF statements. After loading the symptom mapping from the JSON file, it proceeds to consider every possible disease via the first FOR loop. The loop checks each of the patient's symptoms with the known symptoms for the disease and increases a counter every time it finds a match. وزارة التعليم Ministry of Education 94 2024-1446
This version has no hard-coded IF statements.
# Patient 1 my_symptoms=["stuffy nose", "runny nose", "sneezing", "runny nose", "sneezing", "sore throat"] diagnosis=diagnose_v2(my_symptoms, 'symptom_mapping_v2.json', 3) print('Most likely diagnosis:', diagnosis) # Patient 2 my_symptoms=["stuffy nose", "runny nose", "sneezing", "sore throat"] diagnosis-diagnose_v2(my_symptoms, 'symptom_mapping_v2.json', 4) print('Most likely diagnosis: ', diagnosis) # Patient 3 my_symptoms=['fever', 'cough', 'vomiting'] ' diagnosis-diagnose_v2(my_symptoms, 'symptom_mapping_v2.json' 3) print('Most likely diagnosis: ', diagnosis) Most likely diagnosis: ['common cold', 'flu', 'allergies'] Most likely diagnosis: ['common cold'] Most likely diagnosis: [] Patient 1 Symptoms • Stuffy nose • Runny nose Sneezing • Sore throat Patient 2 Symptoms • Stuffy nose • Runny nose • Sneezing • Sore throat Patient 3 Symptoms ⚫ fever • cough • vomiting json symptom_mapping_v2.json Common cold or Flu or Allergies Common cold Figure 2.11: Representation of the second iteration ? Observe that this second iteration is a generalized version of the first iteration. However, this iteration is much more widely applicable, as it can be used as-is with any other knowledge base of the same format, even if it includes thousands of diseases with an arbitrary number of symptoms. It also allows the user to make the diagnosis more or less strict by tuning the matching_symptoms_lower_bound parameter. This can be observed for Patients 1 and 2: even though they have the same symptoms, tuning this parameter leads to a different diagnosis. ...Despite these improvements, this version is still far from perfect and is still not an accurate representation of an actual medical diagnosis. وزارة التعليم Ministry of Education 2024-1446 95
Figure 2.11: Representation of the second iteration
Iteration 3 In this third iteration, you will increase the intelligence of our rule-based system by giving it access to a more detailed type of knowledge base. This new type will take into account the medical fact that certain symptoms are more common than others for each disease. symptom_mapping_file='symptom_mapping_v3.json' with open(symptom_mapping_file) as f: mapping=json.load(f) print(json.dumps (mapping, indent=2)) { "diseases" : { "covid19": { "very common": [ ], "fever", "tiredness", "cough" "less common": [ "headache", "sore throat" وزارة التعليم Ministry of Education 96 2024-1446 ] }, "I common cold": { "very common": [ ], "stuffy nose", "runny nose", "sneezing", "sore throat" "less common": [ "cough" "fever", "headache", "tiredness", "sore throat", "cough" ], "less common": [ "stuffy nose", "sneezing", "runny nose" ] }, "allergies": { "very common": [ ], "stuffy nose", "sneezing", "runny nose" "less common": [ "headache", "tiredness", "cough" ] }, ] } "flu": { } "very common": [ }
Iteration 3
The threshold-based logic on the number of symptoms will be abandoned and replaced with a scoring function that assigns custom weights to very common and less common symptoms. The user will also be given the flexibility to specify whatever weights they think are appropriate. The disease or diseases with the highest weighted sum will then be included in the diagnosis. from collections import defaultdict def diagnose_v3(patient_symptoms: list, ): symptom_mapping_file:str, very_common_weight: float-1, less_common_weight: float=0.5 with open(symptom_mapping_file) as f: mapping-json.load(f) disease_info=mapping['diseases'] #holds a symptom-based score for each potential disease disease_scores=defaultdict(int) for disease in disease_info: # get the very common symptoms of the disease very_common_symptoms-disease_info [disease] ('very common'] # get the less common symptoms for this disease less_common_symptoms-disease_info [disease] ['less common'] for symptom in patient_symptoms: if symptom in very_common_symptoms: disease_scores [disease] += very_common_weight elif symptom in less_common_symptoms: disease_scores[disease]+=less_common_weight #find the max score for all candidate diseases max_score=max(disease_scores.values()) if max_score==0: else: return [ ] # get all diseases that have the max score diagnosis [disease for disease in disease_scores if disease_scores [disease]==max_score] return diagnosis, max_score وزارة التعليم Ministry of Education 2024-1446 97
The threshold-based logic on the number of symptoms
For each possible disease included in the knowledge base, this new function identifies the very common and less common symptoms exhibited by the patient. It then increases the disease's score according to the respective weights. In the end, the diseases with the largest score are included in the diagnosis. You can now test this new implementation with a few examples: # Patient 1 my_symptoms ["headache", "tiredness", "cough"] diagnosis-diagnose_v3(my_symptoms, 'symptom_mapping_v3.json') print('Most likely diagnosis:', diagnosis) # Patient 2 my_symptoms = ["stuffy nose", "runny nose", "sneezing", "sore throat"] diagnosis-diagnose_v3(my_symptoms, 'symptom_mapping_v3.json') print('Most likely diagnosis: ', diagnosis) # Patient 3 my_symptoms = ["stuffy nose", "runny nose", "sneezing", "sore throat"] diagnosis-diagnose_v3(my_symptoms, 'symptom_mapping_v3.json', 1, 1) print('Most likely diagnosis: ', diagnosis) Most likely diagnosis: ( ['flu'], 3) Most likely diagnosis: (['common cold'], 4) Most likely diagnosis: (['common cold', 'flu'], 4) Patient 1 Symptoms • Headache • Tiredness Cough Patient 2 Symptoms Stuffy nose • Runny nose Sneezing • Sore throat Patient 2 Symptoms • Stuffy nose • Runny nose • Sneezing • Sore throat Flu json symptom_mapping_v3.json Common cold Common cold or Flu Figure 2.12: Representation of the third iteration You may observe that, even though the 3 symptoms for Patient 1 ("headache", "tiredness", "cough") are shared by the flu, covid19, and allergies, only the flu is included in the diagnosis. This is because all three symptoms are listed as 'very common' in the knowledge base, leading to a maximum score ⚫of 3. Similarly, while Patients 2 and 3 have the same symptoms, the different weights submitted for very common and less common symptoms lead to different diagnoses. Specifically, using an equal pill weight for the two symptom types leads to the addition of the flu in the diagnosis. Ministry of Education 98 2024-1446
For each possible disease included in the knowledge base,
Iteration 4 The rule-based system could be further improved by increasing the sophistication of the knowledge base and by experimenting with different scoring functions. Even though this would indeed lead to improvement, it would still require a considerable amount of time and manual effort. Thankfully, there is a way to automatically build a rule-based system that is intelligent enough to directly construct its own knowledge base and scoring function: by using machine learning. Rule-based machine learning applies a learning algorithm to automatically identify useful rules, rather than a human needing to apply prior domain knowledge to manually construct rules and curate a rule set Instead of a hand-crafted knowledge base and a scoring function, a machine learning algorithm expects only one input: a historical dataset of patient cases. By learning directly from data, problems associated with the acquisition and validity of background knowledge are prevented. Each case consists of a patient's symptoms and a medical diagnosis made by a human expert. Given such a training dataset, the algorithm can then automatically learn how to predict the most likely diagnosis for a new patient. import pandas as pd #import pandas to load and process spreadsheet-type data medical_dataset=pd.read_csv('medical_data.csv') # load a medical dataset medical_dataset fever cough tiredness headache stuffy nose runny nose sneezing sore throat diagnosis 0 1 1 1 0 0 0 0 0 covid19 1 0 1 1 1 0 0 0 0 covid19 2 1 1 1 0 0 0 0 0 covid19 3 1 1 1 0 0 0 0 0 covid19 4 1 1 1 0 0 0 0 0 covid19 1995 0 1 0 0 1 0 1 1 common cold 1996 0 0 0 1 1 1 1 0 common cold 1997 0 0 1 0 1 0 0 1 common cold 1998 0 0 0 0 1 0 0 1 common cold 1999 0 1 0 0 0 0 1 1 common cold The dataset consists of 2,000 patient cases. Each case has 8 possible symptoms: fever, cough, tiredness, headache, stuffy nose, runny nose, sneezing, and sore throat. Each of these is encoded in a separate binary column. A binary digit 1 means that the patient had the symptom, while a binary digit 0 means that the patient did not have it. وزارة التعليم Ministry of Education 2024-1446 99
Iteration 4
The final column includes the diagnosis made by the human expert. There are four possible diagnoses: covid19, flu, allergies, common cold. You can easily validate this with Python code: set(medical_dataset['diagnosis']) Even though there are dozens of possible machine learning algorithms that can be used with such a dataset, you will use one that follows the logic-based approach: a decision tree. Specifically, you will use the Decision TreeClassifier class from the popular sklearn Python library. from sklearn.tree import DecisionTreeClassifier def diagnose_v4(train_dataset: pd.DataFrame): # create a Decision Tree Classifier model Decision Tree Classifier(random_state=1) # drop the diagnosis column to get only the symptoms train_patient_symptoms-train_dataset.drop(columns=['diagnosis']) # get the diagnosis column, to be used as the classification target train_diagnoses-train_dataset['diagnosis'] # build a decision tree model.fit(train_patient_symptoms, train_diagnoses) # return the trained model return model The Python implementation of this fourth version is considerably shorter and simpler than the previous ones. It simply reads the training file, uses it to build a decision tree model based on the relations between symptoms and diagnoses, and then returns the custom model. In order to properly test this version, begin by splitting our dataset into two separate training and testing sets. from sklearn.model_selection import train_test_split # use the function to split the data, get 30% for testing and 70% for training train_data, test_data = train_test_split(medical_dataset, test_size=0.3, random_state=1) # print the shapes (rows x columns) of the two datasets print(train_data.shape) print(test_data.shape) (1400, 9) (600, 9) وزارة التعليم Ministry of Education 100 2024-1446
The final column includes the diagnosis made by the human expert.
You now have 1,400 data points that will be used for training the model and 600 that will be used to test it. Begin by training and visualizing the decision tree model. from sklearn.tree import plot_tree import matplotlib.pyplot as plt my_tree diagnose_v4(train_data) # train a model print(my_tree.classes_) # print the possible target labels (diagnoses) plt.figure(figsize=(12,6)) # size of the visualization, in inches # plot the tree plot_tree(my_tree, ) max_depth=2, fontsize=10, feature_names=medical_dataset.columns[:-1] ['allergies' 'common cold' 'covid19' 'flu'] sore throat <= 0.5 gini = 0.606 samples 791 value=[354, 340, 71, 26] runny nose <= 0.5 gini = 0.497 samples = 526 value [354, 101, 58, 13] fever <= 0.5 gini = 0.75 samples = 1400 value=[354, 345, 358, 343] tiredness <= 0.5 gini = 0.182 samples = 265 value=[0, 239, 13, 13] sore throat <= 0.5 gini = 0.507 samples 609 value=[0, 5, 287, 317] sneezing <= 0.5 gini = 0.387 samples = 317 value=[0, 2, 235, 80] headache <= 0.5 gini = 0.309 samples = 292 value=[0, 3, 52, 237] (...) (...) (...) (...) (...) (...) (...) (...) وزارة التعليم Ministry of Education 2024-1446 Figure 2.13: Decision tree model for the medical_data dataset, with two levels depth 101
You now have 1,400 data points that will be used for training the
The plot_tree() function is used to visualize a decision tree. Due to lack of space, only the first two levels (plus the root) are visualized. This number can be easily tuned via the max_depth parameter. Depth of the decision tree Each node in the tree represents a subset of the patients. For example, the root node represents the full population of the 1,400 patients in the training set. Out of these, 354, 345, 358, and 343 patients were diagnosed with allergies, the common cold, covid 19 and the flu, respectively. fever <= 0.5 gini = 0.75 samples 1400 value=[354, 345, 358, 343] The tree is built in a top-down fashion via binary splits. The first split is based on whether or not the patient has a fever or not. Given that all symptom features are binary, a <=0.5 check is True if the patient did not have the symptom. Those that did not have a fever (left path) are further split based on whether or not they had a sore throat. Those that did not are then split based on whether or not they had a runny nose. The node at this point includes 526 cases. Out of those, 354, 101, 58, and 13 were diagnosed with allergies, the common cold, covid 19, and the flu, respectively. The splitting continues until the algorithm determines that the cases have been separated into sufficiently pure nodes. A perfectly pure node is one that only includes cases with the same diagnosis. The ⚫⚫"gini" values marked on each node represent scores of the gini index, a popular formula used to evaluate the purity of a given node. وزارة التعليم Ministry of Education 102 2024-1446 The gini index measures a node's impurity, namely the likelihood of the node's contents being classified in the wrong class. The lower the gini index, the more certain the algorithm can be about the classification.
The plot_tree() function is used to visualize a decision tree.
You will now use this decision tree to predict the most likely diagnosis for the patients in the testing set. The testing set is used to evaluate the performance of the model. The exact evaluation method depends on whether the task is one of regression or classification. In classification problems, like the one presented here, computing a model's accuracy and confusion matrix is a common evaluation method. • Accuracy is the proportion of correct predictions made by the classifier. A high accuracy (closer to 100%) means that the classifier is making mostly correct predictions. • A Confusion Matrix is a table that compares the true (actual) labels in a dataset with the predictions made by the classifier. The table includes one row for each true label and one column for each predicted label. Each entry in the matrix represents the number of instances that have the corresponding true and predicted labels. #functions used to evaluate a classifier from sklearn.metrics import accuracy_score, confusion_matrix #drop the diagnosis column to get only the symptoms test_patient_symptoms-test_data.drop(columns=['diagnosis']) # get the diagnosis column, to be used as the classification target test_diagnoses=test_data['diagnosis'] # guess the most likely diagnoses pred=my_tree.predict(test_patient_symptoms) # print the achieved accuracy score accuracy_score(test_diagnoses, pred) 0.8166666666666667 You will observe that the decision tree achieves an accuracy of 81.6%. This means that, out of all 600 test cases, the tree correctly diagnoses 490 of them. You can also print the model's confusion matrix to get a better view of the number of misclassified examples. confusion matrix(test_diagnoses, pred) array([[143, وزارة التعليم Ministry of Education 2024-1446 ol, 3, 0, [ 48, 98, 5, 4], [ 2, 1, 127, 12], [ 1, 3, 31, 122]]) 103
You will now use this decision tree to predict the most likely diagnosis for the patients in the testing set.
Predicted Predicted Predicted allergies common cold covid19 Predicted flu Actual allergies 143 3 0 0 Actual common cold 48 98 5 4 Actual covid19 2 1 127 12 Actual flu 1 3 31 122 Figure 2.14: Confusion matrix of predicted and actual cases The numbers outside of the diagonal represent the model's mistakes. For instance, given that the order of the four possible diagnoses is ['allergies', 'common cold', 'covid 19', 'flu'], the matrix informs us that there were 48 cases of the common cold that were misclassified as allergies and 31 cases of the flu that were misclassified as covid19. Even though the model is not perfect, the fact that it can achieve such a high accuracy by learning its own rule set and without the need for a manually-constructed knowledge base is impressive. Another encouraging factor is that this accuracy is achieved without trying to tune the various performance parameters of the Decision TreeClassifier. It is thus very likely that we can improve the model even further. Another obvious way to improve is to go beyond the limitation of the rule-based model and experiment with different types of machine learning algorithms. You will explore some of these methods in the following unit. وزارة التعليم Ministry of Education 104 2024-1446
Figure 2.14: Confusion matrix of predicted and actual cases
1 2 Exercises What are some advantages and disadvantages of rule-based systems? What is an advantage and a disadvantage of the first iteration? 3 Add a patient to your code in the first iteration of the rule-based system with the symptoms ["vomiting", "abdominal pain", "diarrhea", "fever", "lower back pain"]. What is the diagnosis for this patient? Present your observations below. وزارة التعليم Ministry of Education 2024-1446 105
What are some advantages and disadvantages of rule-based systems?
What is an advantage and a disadvantage of the first iteration?
Add a patient to your code in the first iteration of the rule-based
4 In the second iteration, how many diseases does each patient's diagnosis contain if you change the parameter matching_symptoms_lower_bound to 2, 3 and 4? Modify your code and present your observations. 5 In the third iteration, change both weights to 1 for patients 1 and 2, just like the third patient's. Modify your code and present your observations. 6 Describe briefly how each iteration is enhanced from the previous one (first to second, second to third, third to fourth). وزارة التعليم Ministry of Education 106 2024-1446