# Monday’s Lesson: Finding Median in R the Common Core Way

Most data scientists regret that they didn’t pick up R earlier. This top programming language offers data manipulation, graphics, simulations, and countless application packages. And it’s free! The goal of our Computing with R for Mathematical Modeling (CodeR4MATH) project is to integrate R programming and computational thinking into high school math.

This sample activity demonstrates how programming in R can help strengthen students’ math skills. Download R and RStudio, an integrated development environment for R, or create an account on STATS4STEM.org and use the web-based RStudio.

We’ll use R to explore the concept of median, a measure of central tendency of a set of values. There is a built-in function `median()`, but it’s a black box for students new to statistics. Instead, we’re going to find median the Common Core way by emphasizing algorithmic thinking. First, write down the steps to find the median of a given dataset. Now, find a partner to use your instructions on the following two datasets. If your partner gets stuck, modify your instructions.

DATASET 1: Kilowatt-hours of electricity used by a family in the past several months:

`630, 580, 580, 600, 550, 630, 590, 590, 610`

DATASET 2: Bowling scores for a group of friends:

`110, 62, 80, 132, 126, 194, 95, 78`

With so few data points, it’s easy to find the median by hand, but what about datasets with a large number of values? Here’s a dataset of yogurt prices:

`2.09, 1.13, 1.69, 1.00, 2.00, 1.79, 2.09, 1.00, 1.00, 0.60, 1.00, 1.11, 1.79, 1.79, 1.79, 3.19, 1.69, 1.79, 1.99, 5.79, 3.69, 2.79, 2.79, 2.29, 0.59, 1.79, 1.99, 7.69, 1.19, 1.49, 4.49, 4.49, 4.09, 0.89, 0.89, 0.59, 1.99, 2.09, 1.79, 2.09, 2.09, 2.09, 3.99, 0.50, 1.00, 0.79, 1.00, 1.00, 1.59, 0.69, 0.69, 0.69, 0.69`

Step 1. Use the `c()` function to combine all these values and store them in a vector (a sequence of data elements of the same type) called `yogurt_price`. Paste and run the following code in your R console:

`yogurt_price = c(2.09, 1.13, 1.69, 1.00, 2.00, 1.79, 2.09, 1.00, 1.00, 0.60, 1.00, 1.11, 1.79, 1.79, 1.79, 3.19, 1.69, 1.79, 1.99, 5.79, 3.69, 2.79, 2.79, 2.29, 0.59, 1.79, 1.99, 7.69, 1.19, 1.49, 4.49, 4.49, 4.09, 0.89, 0.89, 0.59, 1.99, 2.09, 1.79, 2.09, 2.09, 2.09, 3.99, 0.50, 1.00, 0.79, 1.00, 1.00, 1.59, 0.69, 0.69, 0.69, 0.69)`

Step 2. Use the `=` assignment operator to assign the dataset to a new vector `x`, so you can manipulate this copy without changing the original one. Type the following code in your R console: `x = yogurt_price`

Step 3. Use the `sort()` function to sort the dataset, and then use the = assignment operator to overwrite vector `x` with the sorted data. `x = sort(x)`

Step 4. Use the `length()` function to count the total number of values in the dataset and store it in a variable `n`. `n = length(x)`

There are 53 yogurts. With an odd number of data points, the index of the median is (n+1)/2.

Step 5. Calculate the index `i` using arithmetic operators in R. `i = (n + 1) / 2`

Step 6. Use the `[ ]` operator to select the median based on the index identified above. `x[i]`

The median of the `yogurt_price` dataset is 1.79.

Now we are going to add a few products to the yogurt price dataset. Their prices are:

`2.79, 1.99, 2.79, 1.99, 1.99, 1.91, 4.49, 4.49, 4.49, 5.79, 5.79.`

Let’s use the `c()` function to combine the original `yogurt_price` vector with the new data and store them in a new vector called `yogurt_price_updated`. Paste and run the following code in your R console:

`yogurt_price_updated = c(yogurt_price, 2.79, 1.99, 2.79, 1.99, 1.99, 1.91, 4.49, 4.49, 4.49, 5.79, 5.79)`

Try to find the median using R functions and operators.

Step 1. Use the `=` operator to assign the `yogurt_price_updated` dataset to a new vector `x`.

Step 2. Use `sort()` to sort `x` in an ascending order and rewrite it with the sorted vector

Step 3. Use `length()` to count the total number of values and store the count in a variable `n`

Step 4. Calculate the indices of the two values in the middle: 1) use `/` operator to divide `n` by 2 and then use `=` operator to store the result in a variable `i1`; 2) use `/` operator to divide `n` by 2 and then plus 1. Then use `=` operator to store the result in a variable `i2`.

Step 5. Use `[ ]` operator to select the two values in the middle by their indices `i1` and `i2`. Then take the average of the two values using arithmetic operators `+`, `()`, and `/`

The R code for finding the median of the yogurt_price_updated dataset is as follow:

```x = yogurt_price_updated x = sort(x) n = length(x) i1 = n/2 i2 = n/2 + 1 (x[i1] + x[i2]) / 2```

With R, students are encouraged to think computationally. The CodeR4MATH project is researching students’ computational thinking and mathematical modeling competencies.

The median of the `yogurt_price_updated` dataset is 1.85.

Jie Chao (jchao@concord.org) is a learning scientist.

This material is based upon work supported by the National Science Foundation under grant DRL-1742083. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.