Under the Hood: What Do We Do When There’s Too Much Data to Look At?

Many Concord Consortium curricular activities are delivered online, which means we can log student actions. This presents an opportunity and a problem. The opportunity is obvious: by analyzing student actions as they try to achieve a goal, we can infer their state of knowledge and use that information in contextualized real-time help or in summary reports. The problem?
Logged event data tends to be rather voluminous.

Our Measuring Collaboration in Complex Computerized Performance Assessments project with ETS engaged postsecondary students from over 40 campuses and generated over a gigabyte—1.2 million rows—of log data. That’s too much for a human but too little for machine learning algorithms, like those made popular by the Watson program that excels at Jeopardy. To bridge the gap we are developing software that enables researchers to sort and filter data before “drilling down” to interpret the actions of a particular student or team of students.

In this project’s activities, teams of three students work on separate but linked computers to solve a problem on a shared virtual electrical circuit, with four levels of increasing difficulty. (See “What Happens When Students Try to Work Collaboratively?” in the Spring 2018 @Concord.) A total of 139 teams attempted at least one level, with varying success. Each team generated a log file that recorded the actions of all the team members, including messages (students communicated only through a chat window), measurements, calculations, and alterations of the circuit itself.

The goal for each student on a team was to change the resistance value of their resistor to yield a specified goal voltage value known only to that student. Since their resistor was part of the team’s virtual circuit, changes in any one resistor affected the voltage across all the resistors. Thus the team members’ goals could be achieved only if they collaborated. For example, they could communicate with one another via chat and share their goal voltages if they chose to. Armed with all the goal voltages, it is possible for any student to calculate the three resistance values that would put the circuit into the desired goal state.

So, how well do students collaborate? Did teams communicate their goal voltages? If so, did they calculate and communicate everyone’s goal resistance values? How do their actions differ across levels and correlate with the team’s success?

We created a web-based interface in JavaScript that enables us to examine the log data and answer questions of this kind (Figure 1). The interface is, in effect, a simple but powerful filter of the original JSON log files. A researcher can select a level of difficulty (Levels A-D), focus only on teams attempting that level, filter by success or failure, and analyze the actions of successful or unsuccessful teams.

At Level C, for example, two teams succeeded without ever chatting any of the goal voltages. Since these teams could not have calculated their goal resistances, one wonders how they succeeded.

function findFilteredLevels() { //Returns an array
of all the levels remaining after complete filtering
  var filteredLevels = [];
  for (var i = 0; i < attemptedLevels.length; i++) {
    myLevel = attemptedLevels[i];
    if ((RChatFilter(myLevel))) {
      if ((RCalcFilter(myLevel))) {
        if ((VChatFilter(myLevel))) {
          if ((outcomeFilter(myLevel))) {
            if ((levelFilter(myLevel))) {
  return filteredLevels;
Figure 1: Example of a function that computes the set of levels that correspond to the checked boxes.

Thanks to easy filtering, we can do a bit more sleuthing into the data. One of these teams chatted just four times, all off topic. They did, however, collectively perform 36 resistor changes! Examining those changes clearly shows that the team members simply pursued their own goals, continually making slight adjustments to their own resistance value so as to keep their voltage measurement near their goal. This process gradually converged on the desired state and the team eventually succeeded.

We’re currently programming the software to search for sequences of actions that will help us detect such strategies with some confidence. By combining software with evidence from our own eyes, we hope to illuminate the complex interactions that underlie successful (and unsuccessful) collaboration.

Paul Horwitz (phorwitz@concord.org) is a senior scientist.

This material is based upon work supported by the National Science Foundation under grant DRL-1842035. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.