DSET Timeline

The State of Data Science Education: Where It’s Headed and Why It Matters

As society has grown more reliant on data to make critical decisions, policymakers and educators have become increasingly aware of the importance of data fluency in order for students to be productive and competitive in today’s workforce. As Finzer and Dorsey observed, as society’s dependence on data grows, so too does our need to prepare learners for a data-saturated future. Data fluency—the ability to explore, interpret, visualize, and transform data into actionable next steps—can help youth become informed citizens and leaders, able to critically and creatively engage with all manner of datasets, make original discoveries, and help solve some of the world’s most pressing problems.

In 2016, the National Science Foundation prioritized Harnessing the Data Revolution as one of their 10 Big Ideas.The following year, the Concord Consortium hosted the Data Science Education Technology (DSET) conference, which brought together over a dozen educational organizations in one of the first gatherings of the data science education community. The DSET conference was both a watershed moment and a signifier of interest, with over 100 teachers, researchers, technology designers, and others convening around the importance of data science education.

In 2019, we again hosted a critical conference, Designing 2030: Thinking and Doing with Data. This convening further catalyzed a group of education leaders to advance the conversation on how to achieve data fluency and support data science education for all. The goal was to consider how open data and innovative technologies can transform the way we teach and learn science and can broaden participation by more learners.

Building on the surprisingly strong reaction to an episode of the Freakonomics podcast2 in 2019, the Data Science for Everyone (DS4E) initiative and coalition was born, created by the University of Chicago Center for RISC and organized in partnership with the Concord Consortium and the Learning Agency. In 2021, DS4E officially launched with a national call to action, and gathered nearly 300 organizations to commit to the cause. DS4E supports a growing community that is working to expand K-12 data science education for every student. (Read an interview with Chad Dorsey and DS4E Director Zarek Drozda on page 6.)

A vision for data science education

With the importance of data science education clearly established, a multitude of efforts have worked to answer the next vital question: What should students know and be able to do with data? In 2020, the National Council of Teachers of Mathematics released the Pre-K-12 Guidelines for Assessment and Instruction in Statistics Education II (GAISE II) detailing a variety of skills necessary for making sense of contemporary data. And then in 2022, the National Academies Foundations of Data Science for Students in Grades K-12 workshop brought K-12 data science education experts together to identify the goals of data science instruction and the supports necessary to enhance student learning.

However, there was one key challenge to implementation: educators and students needed developmentally appropriate tools, curricula, and datasets.While professional-caliber data science software can be powerful, the high level of programming skills it demands can stand as a barrier to entry for K-12 students. Back when the “data science education dilemma” was first being discussed, we began developing the Common Online Data Analysis Platform, a novel browser-based software designed for learning and data exploration, visualization, and analysis. CODAP, which grew out of KCP Technologies’ Fathom Dynamic Data Software project, reduced many barriers to K-12 data science education—it is free and open source, does not require installation, and is accessible to students in Grades 5–12 with no programming background.

Resources geared toward K-12 data science education have also come a long way. In addition to the many resources that the Concord Consortium has designed to accompany CODAP (see, for example, Figure 1), groups including Tuva Labs, UCLA’s Introduction to Data Science Project, Bootstrap, DataClassroom, Stanford’s YouCubed, CodeHS, and other providers have developed a range of curricula, datasets, and data analysis tools aimed at K-12 classrooms.

The Four Seals CODAP document developed in collaboration with EDC’s Oceans of Data Institute.
Figure 1. The Four Seals CODAP document developed in collaboration with EDC’s Oceans of Data Institute.

New challenges and directions

One of the most significant obstacles facing K-12 data science education is the availability of appropriately curated datasets. While real-world datasets are available from many trusted sources, they can be large and unwieldy, with attributes that are added over time, units that are missing, or attributes that are difficult to understand (e.g., complex rates).

We are currently exploring the challenge of how best to support teachers and students with appropriate resources and tools. Properly designed data portals could greatly aid teachers in searching through collections of datasets to identify topics and related datasets of appropriate size and complexity, and to discover questions to investigate with students. Scaffolds for learners such as pre-made data visualizations can help them more readily learn to explore data on their own. Similarly, students can benefit from first interpreting data to answer a provided question, then deepening their inquiry by devising and answering their own questions when engaging with data. Creating data experiences and datasets along this spectrum is key for supporting the accelerating uptake of data science education across grades and subject areas.

We are also re-engineering CODAP’s underlying source code using a modern web application architecture to ensure its availability for years to come and expanding the CODAP community.

We are further supporting engagement by outside contributors to CODAP’s open-source code base, as well as collaborators who integrate CODAP in curriculum development and educational research, K-12 students and teachers, members involved in citizen science projects, higher education faculty, and others.

Lastly, this quickly growing field needs a strong foundation of research upon which to base everything from pedagogical techniques to assessments and new curricular approaches. We are actively supporting the data science education research community and recently launched the Data Science Education Research Community of Practice Database as a way to help make work visible and encourage connections. Leveraging empirical research and proven best practices is key to answering questions about how to best support students’ development of data fluency.

The Concord Consortium is a founding member of Data Science for Everyone (DS4E), a coalition created by the University of Chicago Center for RISC and organized in partnership with the Learning Agency in order to expand K-12 data science education for every student. We sat down with Concord Consortium President and CEO Chad Dorsey and DS4E Director Zarek Drozda to discuss the importance of K-12 data science education and keys for improving data fluency across the country.

Q. Why is data science education so important?

Dorsey: All students need to understand how to work with data. We have realized for some time now that there is a deluge of data and that data are important outside of just mathematics or statistics courses. Data need to be in classrooms everywhere, and learning about and with data should be an interdisciplinary enterprise in a way that reflects the actual work of data scientists.

Drozda: Technologies are moving quickly and students need relevant educational experiences. For example, when students hear the word “data,” they think about cellular service. We need to show that data—quantitative information in a data table—can help them solve problems that are relevant to their community or explore things they find exciting, whether it’s Spotify trends, NBA scores, or a local policy issue. We also need students to know the value of working with data. Data are increasingly part of every industry sector, from agriculture to advanced manufacturing, health care, finance, small business management, and more. Every student will need to know the basics of data before they graduate high school or they will not be able to access 21st-century jobs.

Q. Where do you think data science belongs in K-12 education?

Dorsey: Data need to be incorporated into all subjects. Data scientists see themselves as floating across disciplines in many ways. K-12 data science education should take the same view.

Drozda: Introductory data science deserves explicit time in the school schedule. Currently it fits best in the math classroom because it connects math, statistics, and computer science. There are also many opportunities to integrate data science into existing school subjects that can empower and enliven content.

Dorsey: The larger goal should be that students come to see data analysis as a lens they can use to study phenomena everywhere.

Q. What resources are available for educators who want to bring data science into their teaching?

Dorsey: CODAP is a free and powerful tool for exploring data with students. Concord has worked extensively with many other researchers and curriculum developers to integrate CODAP into data science education approaches and activities and to create resources that help educators and students learn to use CODAP. Over 50 CODAP example documents with datasets on a wide range of topics are available.

Drozda: Data Science for Everyone has a large resource hub, which includes CODAP as well as other tools and curriculum resources. CODAP is a great tool for classroom use, especially for students who aren’t necessarily looking for an R or Python scripting experience. CODAP provides a powerful and approachable way to engage in learning data science without a high barrier to entry.

Q. What is the future of K-12 data science education?

Drozda: The field needs a common learning framework for states, districts, researchers, and assessment developers to draw upon as the priority learning outcomes for data science and to make all high school graduates data literate. That said, there is not a one-size-fits-all approach to data science education. Seventeen states now have official data science education programs, and we expect that number to increase.

Dorsey: The need for data science education and data literacy will continue to be critical with ever-growing datasets and continually transforming technologies, as we have seen over the past year with the rise of big-data-driven large language models like ChatGPT. Concord will continue to create resources and foster the K-12 data science education community. We are excited about our work to overhaul CODAP and related resources as well as our new Data Science Education Research Community of Practice specifically designed for networking and sharing resources. This is an ongoing effort to create a research-based framework for data science education learning progressions, and work toward larger conferences for practitioners and researchers. Overall, I’m very optimistic about the continued growth of data science education.

1. Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7(2).

2. Levitt, S., & Dubner, S. (2019, October 2). America’s math curriculum doesn’t add up (No. 391) [Audio podcast episode]. In Freakonomics Radio. https://freakonomics.com/podcast/ americas-math-curriculum-doesnt-add-up-ep-391/

Zac Opps (zopps@concord.org) is a project manager.
Jacob Sagrans (jacob@tumblehomelearning.com) is a research associate at Tumblehome, Inc.

This material is based upon work supported by the Valhalla Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funder.