Data Science Games

The age of data science is upon us! And what better way to learn the basics of this new, essential field than through games? A data science game generates data—lots of data—as you play. And the only way to win is to figure out a good way to visualize the data so you can see what’s going on, improve your strategy, and level up.

Project partners include software and curriculum developers from the Concord Consortium, a game developer from Eeps Media, and a learning science researcher from the University of California at Berkeley. We are creating a suite of games for high school science, so students can experience the excitement of data science.

Data science is an emerging discipline that is a partial union of mathematics and statistics, subject matter expertise, and computational thinking. It is rapidly gaining recognition as critical to all fields of endeavor. But middle and high school students currently do not get very much experience working with data. Furthermore, experience with data is best gained in the context of learning subject-specific matter. The Data Science Games project is developing games and curriculum materials for students to work with data in their science classes.

All data science game scenarios follow a similar design: students take context-specific actions in the game. They store their data, organize it, analyze it, and visualize it in the surrounding CODAP environment. In a chemistry scenario, for example, the game might involve the custom design of reactions to achieve specific goals such as buffering a chemical system so it doesn’t explode. Each reaction students observe generates data about bond strength, activation energy, pressure, temperature, and concentration that they use to solve specific problems or puzzles. A physics game might involve building or altering a structure within certain constraints where the data consist of the forces on all the structural elements. Each game can be used as the focal point for significant learning about subject matter content. With each game students experience new and challenging data science.

Most educational games are self-contained, providing all the functionality necessary to do well at the game within themselves. The data science games we are developing are embedded in CODAP, a data analysis environment designed for students to explore, visualize, and analyze data. Though the game is embedded in the environment’s browser page, its implementation (language, domain) is completely independent of the environment so that any developer can create a data science game without CODAP developers. The games are interoperable with the CODAP environment. These three characteristics—embeddedness, communication, and interoperability—have only recently become possible in browser-based settings. The goal of the Data Science Games project is to develop this new genre of educational technology and to explore its educational potential.

Principal Investigators

William Finzer
Frieda Reichsman
Tim Erickson
Michelle Wilkerson-Jerde

Project Inquiries

Share This

Data Science Games Screen Shot

National Science Foundation (NSF) Logo

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1530578. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Foundational Research

We expect Data Science Games to make pre-college students’ reasoning about new data forms and new methods of working with data available for study in ways they have not been before. Specifically, we are interested in exploring young people’s conceptions of data structures (flat, hierarchical, tree, directed graphs, relational) and data science competencies (cleaning, transforming, restructuring, merging/coordinating, creating measures, and visualizing data in order to address a problem or question). We seek to: (a) Expand research on young people’s conceptions of data, especially more complex data structures. (b) Identify student competencies unique to contemporary data sciences. (c) Contribute principles for the design and use of integrated data science environments that emphasize these structures and competencies in classroom settings.

Research Questions

  1. How do students engage with core statistical ideas such as center, spread, distribution, and inference when working with alternative data structures such as hierarchies or networks?
  2. How, and under what conditions, do learners leverage their knowledge about scientific concepts, computation, statistics, and the context of the game to clean, transform, and organize data toward solving problems?
  3. What social, technological, and pedagogical supports are needed to productively engage students in data science explorations in high school classroom contexts?

Structural Research Questions

Structural research focuses on discovery of ways that data science games, as a new technological genre, can be used to increase and deepen student encounters with data-rich situations in the classroom. Our partnership with San Francisco Unified School District, a large, diverse public school district, will help connect our findings to broad classroom practice. To what extent can the subject matter as it appears in the game differ from that in the real world without impeding learning? How can students be motivated to take on the role of a data scientist, in particular to seek to clean, transform, restructure, and visualize data generated by the game? What are the characteristics of a successful data science game? What affordances come from the tight coupling of game and data analysis environments, and do these suggest additional roles that the new genre can take on in classroom learning?

Log In

Don't have a profile?

Create a profile and...

Create your profile now »