Describing Data with Statistics Part Two - Business Decision Making - ثاني ثانوي

Lesson 5 Chapter 9 www.ien.edu.sa Describing Data with Statistics (Part Two) 5-1 Variability In the previous lesson, we learned how we can simplify our understanding of a large set of data by calculating what the center value of the dataset is. Although this description of the data is very useful, reporting a measure of center gives only partial information about a dataset. It is also important to describe how much the observations differ from one another. When individual data elements are different from each other we say that there is variability in the data set. The three different samples displayed in the figure below all have mean = median = 45. There is a lot of variability in the first sample compared with the third sam- ple. The second sample shows less variability than the first and more variability than the third. Most of the variability in the second sample is due to the two extreme values being so far from the center. وزارة التعليم Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 307 DEFINITION Variability: The extent to which individual data elements in the data set are different from one another. FIGURE 9-5: Three samples with the same center and different amounts of variability Sample 1. 20, 40, 50, 30, 60, 70 2. 47, 43, 44, 46, 20, 70 + 3. 44, 43, 40, 50, 47, 46 + + + + 20 30 40 50 60 70 = Mean Median In this example, we can see that the mean and median alone don't fully describe the data. In fact, they are the same for all three samples (= 45), yet we can visu- ally see that the data sets are different. It would be useful to also have a measure that describes how much the data set varies from its central point. Using Data to Support the Decision Making Process 307 30/06/2023 14:28

5: Describing Data with Statistics (Part Two)

Variability

QUICK TIP If the range of a data set is relatively small, then using the range to describe data can be useful. When the range is large, other tech- niques are preferred. The simplest numerical measure of variability is the range. In general, more vari- ability will result in a larger range. However, variability is a characteristic of the entire data set, and each observation contributes to variability. The first two sam- ples plotted above both have a range of 70 - 20 = 50, but there is less variability in the second sample. Because it is calculated using only the largest and smallest values in the data set, the range is not usually the best measure of variability. 308 9 Chapter رة ا Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 308 DEFINITION Range: The range of a data set is defined as Range = Largest Value - Smallest Value The most widely used measures of variability describe the extent to which the sample observations deviate from the sample mean x. Subtracting x from each observation gives a set of deviations from the mean. DEFINITION Deviations from the mean: The n deviations from the sample mean are the differences - (x₁ − x), (×₂ − x), ..., (x − x) - - Notice that a deviation will be positive if the corresponding x value is greater thanx and negative if the x value is less than x. To prevent negative and pos- itive deviations from cancelling each other out, we can square them before adding them together into a combined score. Then deviations with opposite signs but with the same magnitude, such as +2 and -2, make identical con- tributions to a measure of variability. The squared deviations are (x₁ − x)², (×₂ — x)², ..., (xn− x)² and the sum is (x, - x)² + (×₂ − x)² + ... + (xn− x)² = Σ(x − x)² Dividing this sum by the sample size n gives the average squared deviation. Although this seems to be a reasonable measure of variability, we use a divi- sor slightly smaller than n (n − 1). 30/06/2023 14:28

5: Describing Data with Statistics (Part Two)

Deviations from the mean

Range

QUICK TIP Variance values tend to be very large with some data sets. Standard deviation is more commonly used as it shows the "typical" devia- tion from the mean (rather than the larger measure of overall variance). The calculation of s² can be a bit tedious, especially if the sample size is large. Fortunately, many calculators and computer software packages can compute the variance and standard deviation. The standard deviation can be infor- mally interpreted as the size of a "typical" or "representative" deviation from the mean (see Figure 9-6). وزارة التعليم Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 309 DEFINITIONS Sample variance: The sample variance, denoted by s², is the sum of squared deviations from the mean divided by n - 1. That is: s² = Σ(x-x)² n-1 Sample standard deviation: The sample standard deviation is the positive square root of the sample variance and is denoted by s. FIGURE 9-6: Standard deviation from the mean SD=5 SD=10 Mean 10 20 30 40 50 60 70 80 90 100 As with x, the value of s can be greatly affected by the presence of even a single unusually small or large observation. The interquartile range is a measure of variability that is resistant to the effects of outliers. It is based on quantities called quartiles. The lower quartile separates the smallest 25% of the data set from the greatest 75%, and the upper quartile separates the greatest 25% from the smallest 75%. The middle quartile is the median, and it separates the lower 50% from the upper 50%. Figure 9-7 illustrates the locations of these quartiles for a smoothed histogram. Using Data to Support the Decision Making Process 309 30/06/2023 14:28

5: Describing Data with Statistics (Part Two)

Sample variance

310 9 Chapter رة ا Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 310 DEFINITIONS Interquartile range (IQR): A measure of variability that is not as sensitive to the presence of outliers as the standard deviation. The IQR is calculated as: IQR = upper quartile lower quartile Lower quartile: Median of the lower half of the sample. Upper quartile: Median of the upper half of the sample (if n is odd, the median of the entire sample is excluded from both halves when calculating quartiles). FIGURE 9-7: The quartiles for a smoothed histogram 25% 25% 25% 25% Lower quartile Median Upper quartile 5-2 Correlation The relationship between two or more variables is called correlation. Correlation can be positive, where points in the data set move in the same direction, or it can be negative, where the relationship between data points is inverse-they move in the opposite direction. Table 9-2 shows the attendance rates and test scores of a group of students. Note that, in general, the more days a student attends school, the greater their success. TABLE 9-2: A group of students' attendance rates and test scores Student Attendance (%) Abdullah 95 Test scores (%) 85 Maryam 74 67 Muhammad 83 81 Ali 87 65 Saad 98 91 Layla 91 94 30/06/2023 14:28

5: Describing Data with Statistics (Part Two)

DEFINITIONS Interquartile range (IQR)

Correlation

The quartiles for a smoothed histogram

DEFINITIONS Positive correlation: A relationship between two variables that move in the same direction, like a person's height and weight. Negative correlation: Also called an "inverse relationship"; two related variables move in opposite directions. For example, a higher number of school absences will likely result in lower grades. YOU TRY IT Understanding Variability: Head Sizes Materials needed: Each team will need a measuring tape. For this activity, you will work in teams of 6 to 10 people. 1. Designate a team leader for your team. 2. The team leader should measure and record the head size (measured as the cir- cumference at the widest part of the forehead) of each of the other members of his or her team. 3. Record the head sizes for the individuals on your team as measured by the team leader. 4. Next, each individual on the team should measure the head size of the team leader. Do not share your measurement with the other team members until all team members have measured the team leader's head size. 5. After all team members have measured the team leader's head size, record the different team leader head size measurements obtained by the individuals on your team. 6. Using the data from Step 3, construct a dot plot of the team leader's measure- ments of team head sizes. Then, using the same scale, construct a separate dot plot of the different measurements of the team leader's head size (from Step 5). Now use the available information to answer the following questions: 7. Do you think the team leader's head size changed in between measurements? Ask the other members of the team to share the measurements they took. Are they all the same? If not, can you explain why they might be different? 8. Which data set was more variable-head size measurements of the different indi- viduals on your team or the different measurements of the team leader's head size? Explain the basis for your choice. وزارة التعليم Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 311 Using Data to Support the Decision Making Process 311 30/06/2023 14:28

5: Describing Data with Statistics (Part Two)

DEFINITIONS Positive correlation

Understanding Variability: Head Sizes

9. Consider the following scheme (you don't actually have to carry this out): REVIEW QUESTIONS Suppose that a group of 10 people measured head sizes by first assigning each person in the group a number between 1 and 10. Then person 1 measured per- son 2's head size, person 2 measured person 3's head size, and so on, with person 10 finally measuring person 1's head size. Do you think that the resulting head. size measurements would be more variable, less variable, or show about the same amount of variability as a set of 10 measurements resulting from a single individ- ual measuring the head size of all 10 people in the group? Explain. 1. The following data are costs (in SAR) per kilogram for nine differ- ent brands of dates: 12.90 16.20 13.70 14.10 17.00 18.20 14.70 15.20 14.90 a. Calculate the variance and standard deviation for this data set. (Hint: Use a spreadsheet.) b. If a very expensive brand of luxury dates with a cost per kilogram of SAR 35.0 was added to the data set, how would the values of the mean and standard deviation change? 2. The price (in SAR) of the eight smartphones that were rated high- est in 2022 were: 1,730 2,150 2,130 2,100 1,480 2,300 2,250 3,520 a. Calculate the values of the variance and the standard deviation. b. The standard deviation is quite large. What does that tell you about the prices of these highly rated smart phones? 3. Look closely at Table 4-2. The data shows a positive correlation between the two variables, but which student's data doesn't fit with the correlation? a. Muhammad b. Layla 312 9 Chapter رة ا Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 312 c. Ali d. Saad 30/06/2023 14:28

5: Describing Data with Statistics (Part Two)

Consider the following scheme (you don’t actually have to carry this out)

The following data are costs (in SAR) per kilogram for nine different brands of dates

The price (in SAR) of the eight smartphones that were rated highest in 2022 were

Look closely at Table 4-2. The data shows a positive correlation between the two variables, but which student’s data doesn’t fit with the correlation