Describing Data with Statistics Part One - Business Decision Making - ثاني ثانوي

Lesson 4 Chapter 9 www.ien.edu.sa Describing Data with Statistics (Part One) 4-1 The Importance of Statistics Humans can sometimes be irrational when making decisions. This isn't meant as a criticism, but rather an observation about the way our minds tend to process incoming data and information. For example, studies have shown that in some cases people overestimate the extent to which physical exercise can compensate for food consumption. When this happens, people increase food intake more than is justified based on the exercise performed. A group of researchers began to wonder whether even thinking about exercise would lead to increased food consumption, "Just Thinking About Exercise Makes Me Serve More Food: Physical Activity and Calorie Compensation". They carried out an experiment in which people were offered snacks as a reward for participating in the experiment. People read a short essay and then answered a few questions. In Figure 9-3, some participants read an essay that was unrelated to exercise labeled "Calories control", some read an essay that described listen- ing to music while taking a casual 30-minute walk labeled "Calories fun", and some read an essay that described strenuous exercise labeled "Calories exercise". Participants were then given two plastic bags and invited to help themselves to two types of sweet and savory snacks. After the participants served themselves, the bags were weighed so that the researchers could determine the number of calories in the snacks that were consumed. The number of calories consumed by each group were used to construct the comparative dot plots shown in the fig- ure below. From the dot plots, it is clear that the number of calories consumed differs from person to person and also tends to be quite a bit higher for those who read about exercise than for those in the control group. To further compare the three distributions, it is helpful to summarize them numerically. FIGURE 9-3: Dot plot of calories Calories control Calories fun Calories exercise 300 400 500 600 700 800 900 Calories وزارة التعليم Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 301 Using Data to Support the Decision Making Process 301 30/06/2023 14:28

4: Describing Data with Statistics (Part One)

The Importance of Statistics

QUICK TIP Measures of center are use- ful because in most things we measure, the data tend to group around this cen- ter point. This central mea- sure tells us a lot about the data with a simple figure. 4-2 Numerical Summary Measures Next, we will look at how to calculate numerical summary measures that describe both the center and the extent of variability in a data set. When describing numerical data, it is common to report a value that is represen- tative of the observations in the data set. Such a number describes roughly where the data are located or "centered" along the number line, and is called a measure of center. The two most widely used measures of center are the mean and the median. 302 9 Chapter رة ا Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 302 DEFINITION Measure of center: A summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution. The mean of a numerical data set is just the familiar arithmetic average: the sum of the observations divided by the number of observations. At this point, it is helpful to introduce notation for the variable on which observations are made, for the number of observations in the data set, and for the individual observations: x = the variable observed n = the number of observations in the data set (the sample size) x1 = the first observation in the data set x2 = the second observation in the data set . . . xn the nth (last) observation in the data set For example, we might have a sample consisting of n = 4 observations on x = time it takes to complete an online hotel reservation (in minutes): x1 = 5.9 x2 = 7.3 x3 = 6.6 x4 = 5.7 Notice that the value of the subscript on x has no relationship to the magni- tude of the observation. In this example, x1 is just the first observation in the data set and not necessarily the smallest observation, and xn is the last obser- vation but not necessarily the largest. ... / The sum of x1, x2, xn can be denoted by x1 + x2 + ... + xn, but this can also be written using summation notation. The Greek letter Σ denotes sum- mation. In particular, Σx denotes the sum of all the x values in the data set under consideration. 30/06/2023 14:28

4: Describing Data with Statistics (Part One)

mean

Numerical Summary Measures

وزارة التعليم Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 303 DEFINITION Sample mean: The sample mean of a sample consisting of numerical observations X₁, X2, ..., X is denoted by x, and its formula is given by n x = sum of all observations in the sample X₁ + x2 + number of observations in the sample ... n + X ΣΧ = = n n The median strip of a highway divides the highway in half, and the median of a numerical data set does the same thing for a data set. Once the data values have been listed in order from smallest to largest, the median is the middle value in the list, and it divides the list into two equal parts. Depending on whether the sample size n is even or odd, the process for determining the median is slightly different. When n is an odd number (say, 5), the sample median is the single middle value. But when n is even (say, 6), there are two middle values in the ordered list, and we average these two middle values to obtain the sample median. DEFINITION Sample median: The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then sample median = the single middle value if n is odd, the average of the middle two values if n is even When would you use the median versus the mean to describe a set of data? Consider the following example: A company hosts product information on its Web site for its customers to access. A report showing the number of times that each product information page was accessed during the past week is shown below. The sample size for the Web site access data was n = 40, an even number. Arranging the data in order from smallest to largest produces the following ordered list (with the two middle values highlighted): 000000 3 4 4 4 5 5 7 7 888 12 12 13 13 13 14 14 16 18 19 19 20 20 21 22 23 26 36 36 37 42 84 331 Using Data to Support the Decision Making Process 303 30/06/2023 14:28

4: Describing Data with Statistics (Part One)

Sample mean

Sample median

When would you use the median versus the mean to describe a set of data

QUICK TIP Because outliers can affect your results, you should treat them thoughtfully. Sometimes you can elimi- nate the outliers from the data set before you work with it. In other cases, it makes sense to use the median rather than the mean. The mean of this data set = (sum of the values) / 40 = 23.10. Which of these results appears to be a more typical value for the data set? In this example, the median is a better description of the data than the mean. This is due to an "outlier", a piece of data that is much larger or smaller than the other values in the data set. The sample mean can be sensitive to even a single value that lies far above or below the rest of the data. The value of the mean is pulled toward such outlying values. 304 9 Chapter رة ا Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 304 DEFINITION Outlier: A piece of data in a data set that is much larger or smaller than the other values in the data set, or an experience that is extreme- ly good or bad. The mode of a data set is the value that appears most often. It is another mea- sure of central tendency, but is less descriptive of the data than either mean or median. In some analyses, it is useful to identify whether a particular value appears more frequently than others. DEFINITION Mode: The value in a set of data that occurs most frequently. The mode does not have to be a unique number. TABLE 9-1: Mean, median, and mode Mean Mean is the average value that Median Mode is equal to the ratio of sum of Median is the central value of a Mode is the most repetitive given set of values when value of a given set of values. arranged in an order. values in a data set and total number of values. Mean Sum of observations/ = Number of observations. For example, if we have a set of values = 2, 2, 3, 4, 5, then: Mean = (2+2+3+4+5)/5 = 3.2 Median = 3 Mode = 2 Example: Number of Visits to a Class Web site 40 students were enrolled in science classes at a local school in Jeddah. The instructor made course materials, grades, and class notes available to students on a class Web site, and the Web server kept track of how often each student accessed any of the Web pages. One month after the course began, the instruc- tor produced a report that indicated how many times each student had accessed 30/06/2023 14:28

4: Describing Data with Statistics (Part One)

Mode

Outlier

: Number of Visits to a Class Web site

YOU TRY IT وزارة التعليم Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 305 a Web page on the class site. The 40 observations were: 20 37 4 20 0 84 14 36 5 331 19 0 0 22 3 13 14 36 40 18 8 0 26 405 23 19 7 12 8 13 16 21 7 13 12 8 42 The sample mean for this data set is x = 23.10. A dot plot of the data is shown in the figure below. Many would argue that 23.10 is not a very representative value for this sample, because 23.10 is larger than most of the observations in the data set. Notice that only 7 of 40 observations, or 17.5%, are larger than 23.10. The two outlying values of 84 and 331 (no, that was not a typo!) have a substantial impact on the value of x. FIGURE 9-4: A dot plot of the data in the example 0 100 200 300 Number of accesses With a friend or family member, count how many times you can throw a ball to each other in one minute. If you don't have a ball, choose another suitable household item. Repeat this exercise a total of ten times, recording your results in the table below. Use the data you have collected to work out the mean, median, and mode (where applicable), showing how you have worked these out. Go 1 Number of catches in one minute 2 3 4 5 6 7 8 9 10 Mean Median Mode Using Data to Support the Decision Making Process 305 30/06/2023 14:28

4: Describing Data with Statistics (Part One)

A dot plot of the data in the example

With a friend or family member, count how many times you can throw a ball to each other in one minute

REVIEW QUESTIONS 1. The following are the prices (in Riyals) of the six truck tires rated most highly by a consumer review in 2018: 159.00 199.00 157.00 127.65 123.99 126.00 a. Calculate the values of the mean and median. b. Why are these values so different? c. Which of the two-mean or median-appears to be better as a descrip- tion of a typical value for this data set? 2. A recent medical study reported the sodium content (mg) per serving for each of 11 different brands of peanut butter: 120 50 140 120 150 150 150 65 170 250 110 a. Display these data using a dot plot. Comment on any unusual features of the plot. b. Calculate the mean and median sodium content for the peanut butters in this sample. c. The values of the mean and the median for this data set are similar. What aspect of the distribution of sodium content-as pictured in the dot plot from Part (a) provides an explanation for why the values of the mean and median are similar? 306 9 Chapter رة ا Ministry of Education 2024-1446 Business Decision Making S1 S2 S3.indb 306 30/06/2023 14:28

4: Describing Data with Statistics (Part One)

A recent medical study reported the sodium content (mg) per serving for each of 11 different brands of peanut butter

The following are the prices (in Riyals) of the six truck tires rated most highly by a consumer review in 2018