Monday’s Lesson: Finding Median in R the Common Core Way


Most data scientists regret that they didn’t pick up R earlier. This top programming language offers data manipulation, graphics, simulations, and countless application packages. And it’s free! The goal of our Computing with R for Mathematical Modeling (CodeR4MATH) project is to integrate R programming and computational thinking into high school math.

This sample activity demonstrates how programming in R can help strengthen students’ math skills. Download R and RStudio, an integrated development environment for R, or create an account on STATS4STEM.org and use the web-based RStudio.

The three-resistor challenge (Level D), as seen by team member Lion on Circuit 1.
“R Logo” by The R Foundation, licensed under CC-BY-SA 4.0

We’ll use R to explore the concept of median, a measure of central tendency of a set of values. There is a built-in function median(), but it’s a black box for students new to statistics. Instead, we’re going to find median the Common Core way by emphasizing algorithmic thinking. First, write down the steps to find the median of a given dataset. Now, find a partner to use your instructions on the following two datasets. If your partner gets stuck, modify your instructions.

DATASET 1: Kilowatt-hours of electricity used by a family in the past several months:

630, 580, 580, 600, 550, 630, 590, 590, 610

DATASET 2: Bowling scores for a group of friends:

110, 62, 80, 132, 126, 194, 95, 78

With so few data points, it’s easy to find the median by hand, but what about datasets with a large number of values? Here’s a dataset of yogurt prices:

2.09, 1.13, 1.69, 1.00, 2.00, 1.79, 2.09, 1.00, 1.00, 0.60, 1.00, 1.11, 1.79, 1.79, 1.79, 3.19, 1.69, 1.79, 1.99, 5.79, 3.69, 2.79, 2.79, 2.29, 0.59, 1.79, 1.99, 7.69, 1.19, 1.49, 4.49, 4.49, 4.09, 0.89, 0.89, 0.59, 1.99, 2.09, 1.79, 2.09, 2.09, 2.09, 3.99, 0.50, 1.00, 0.79, 1.00, 1.00, 1.59, 0.69, 0.69, 0.69, 0.69

R functions help you automate the steps.

Step 1. Use the c() function to combine all these values and store them in a vector (a sequence of data elements of the same type) called yogurt_price. Paste and run the following code in your R console:

yogurt_price = c(2.09, 1.13, 1.69, 1.00, 2.00, 1.79, 2.09, 1.00, 1.00, 0.60, 1.00, 1.11, 1.79, 1.79, 1.79, 3.19, 1.69, 1.79, 1.99, 5.79, 3.69, 2.79, 2.79, 2.29, 0.59, 1.79, 1.99, 7.69, 1.19, 1.49, 4.49, 4.49, 4.09, 0.89, 0.89, 0.59, 1.99, 2.09, 1.79, 2.09, 2.09, 2.09, 3.99, 0.50, 1.00, 0.79, 1.00, 1.00, 1.59, 0.69, 0.69, 0.69, 0.69)

Step 2. Use the = assignment operator to assign the dataset to a new vector x, so you can manipulate this copy without changing the original one. Type the following code in your R console: x = yogurt_price

Step 3. Use the sort() function to sort the dataset, and then use the = assignment operator to overwrite vector x with the sorted data. x = sort(x)

Step 4. Use the length() function to count the total number of values in the dataset and store it in a variable n. n = length(x)

There are 53 yogurts. With an odd number of data points, the index of the median is (n+1)/2.

Step 5. Calculate the index i using arithmetic operators in R. i = (n + 1) / 2

Step 6. Use the [ ] operator to select the median based on the index identified above. x[i]

The median of the yogurt_price dataset is 1.79.

Now we are going to add a few products to the yogurt price dataset. Their prices are:

2.79, 1.99, 2.79, 1.99, 1.99, 1.91, 4.49, 4.49, 4.49, 5.79, 5.79.

Let’s use the c() function to combine the original yogurt_price vector with the new data and store them in a new vector called yogurt_price_updated. Paste and run the following code in your R console:

yogurt_price_updated = c(yogurt_price, 2.79, 1.99, 2.79, 1.99, 1.99, 1.91, 4.49, 4.49, 4.49, 5.79, 5.79)

Try to find the median using R functions and operators.

Step 1. Use the = operator to assign the yogurt_price_updated dataset to a new vector x.

Step 2. Use sort() to sort x in an ascending order and rewrite it with the sorted vector

Step 3. Use length() to count the total number of values and store the count in a variable n

Step 4. Calculate the indices of the two values in the middle: 1) use / operator to divide n by 2 and then use = operator to store the result in a variable i1; 2) use / operator to divide n by 2 and then plus 1. Then use = operator to store the result in a variable i2.

Step 5. Use [ ] operator to select the two values in the middle by their indices i1 and i2. Then take the average of the two values using arithmetic operators +, (), and /

The R code for finding the median of the yogurt_price_updated dataset is as follow:

x = yogurt_price_updated
x = sort(x)
n = length(x)
i1 = n/2
i2 = n/2 + 1
(x[i1] + x[i2]) / 2

With R, students are encouraged to think computationally. The CodeR4MATH project is researching students’ computational thinking and mathematical modeling competencies.

The median of the yogurt_price_updated dataset is 1.85.

Jie Chao (jchao@concord.org) is a learning scientist.

This material is based upon work supported by the National Science Foundation under grant DRL-1742083. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.