I think you would have to be hiding under a rock in today’s world to believe that there is not a ton of data about you available for consumption by all sorts of agencies and corporations. According to Simplilearn (2019), one person using a smartphone will generate about 40 exabytes of data per month. I had never heard the term exabyte before watching this video, so I searched for a definition. Teradata (n.d.) defined it as “an extraordinarily large unit of digital data, one exabyte (EB) is equal to . . . one billion gigabytes (GB). Some technologists have estimated that all the words ever spoken by mankind would be equal to five exabytes.” It is mind boggling to think that in a five-month span, each of us could be creating data streams that are equivalent to that many words!
There is a term used to describe these massive quantities of data, which is called big data. There are five characteristics that are used to classify data as big data. These are 1) volume, 2) velocity, 3) variety, 4) veracity, and 5) value (Simplilearn, 2019). The data has to have a large volume, be generated at a high speed, in a variety of types and from various sources, be accurate and trustworthy, and have value to the users. All of this data has to be stored, processed and analyzed before it can be effectively used (Simplilearn, 2019; Aldamen, 2021). Big data can also be grouped into three levels: micro, meso, and macro.
Micro-level data can be used to understand how students learn on their own. There are various components that can be used to do this. The knowledge component shows how student interaction and knowledge transfer happens using automated detectors. The metacognition and self-regulation data component can be gathered with the LMS as students are observed navigating the courses. Sensor and interaction based detectors help gather affective states data such as facial expressions and gestures that could reveal frustration or boredom. This can be used to find ways to improve engagement or implement interventions. Another component is to cluster the data in ways that allow for personalization for each student (Aldamen, 2021). This is an interesting concept, and I would like to learn more about this.
Meso-level data can be divided into cognitive, social, behavioral and affective components. Cognitive data is used to support students and assess them. It includes automated grading, which provides immediate feedback to students and valuable information to the teacher. Social data components are gathered from written work in discussion boards and from video transcripts from intelligent tutoring systems. Behavioral data is gathered by watching how students interact with course material and how that behavior demonstrates their understanding. Determining the student self-concept and motivation data is important for determining how the student feels about the course (Aldamen, 2021).
Macro-level data is institutional data that is only occasionally updated. It can be used for early warning systems needed for interventions and for course guidance. It includes demographic information. This type of data can also help institutions evaluate their learning environment (Aldamen, 2021).
Here is a video that describes how big data can be used for interventions that reduce drop-outs and improve test scores.
The educational system is one that is known for producing a large amount of data beginning in preschool and continuing on throughout college and beyond. In order to make most effective use of all of the data collected on students to make data-driven decisions and policy, data systems should be linked from early childhood all the way through college and beyond to the workforce. In order to do this, there needs to be coordination between agencies on the definitions, collection methods, and the types of data collected (SREB, 2018). One issue in attempting to link this big data across various school, government and research entities is that there is not a common set of definitions regarding data. SREB (2015) describes the results of having of ill-defined data definitions,
“If the most common data elements are not defined the same by school districts, state education and work force agencies, and colleges and universities, and analysts cannot interpret the data. Having commonly defined data elements and systematic ways to reconcile differences built into their data systems allow multiple data systems to match or merge records more accurately.”
This is a critical area educational systems need to address both in training, investment in technology and collaboration with others if we are going to benefit from all of the data that is available.
Another critical area of concern in big data in education is the source of the data. Academic background, demographic and educational progress data can be gathered from student information systems (SIS). Learning management systems (LMS) can provide data about student behavior through click stream data, student engagement within online courses, writing in various settings. LMS were not designed to be data collectors for research, so this makes it difficult to get the types of data that are needed (Aldamen, 2021). I have experienced this while using Schoology. The analytics component of Schoology only gives data about whether the student has logged in and accessed an assignment. I can see how much time was spent on the assignments, but I can’t see what was done. The assessment feature will only give question level data and whole class level data, so it is hard to get individual level data in order to create personalized interventions on students.
Limited ability to analyze data is another critical issues relating to big data in education. In order to analyze data, certain skills and programs are needed, and many educators do not have access to the programs or adequate training (Aldamen, 2021). I can attest to this as a teacher. I have taken one statistics course, but I haven’t learned how to apply the analysis skills to the types of data that are gathered in a classroom. This would be an area where I would like to grow.
Data security and privacy are also areas of critical concern regarding big data. Student data needs to be protected so that it is not used inappropriately in marketing or profiling students (Aldamen, 2021). Student data also needs to be protected to maintain its integrity and compliance with federal and state laws that mandate privacy protections. In order to do this, schools should have plans for storage, security, access permissions, retention, third-party systems, and employee data security training (SREB, 2018).
References
Aldamen, H. (2021, April 6). Big data: Shaping the future of Education. YouTube. Retrieved January 21, 2022, from https://www.youtube.com/watch?v=8dm6YHVIXFM
MashableBrandX. (2014, September 3). Big Data's making education smarter. YouTube. Retrieved January 15, 2022, from https://www.youtube.com/watch?v=K_wAHEHTy-g
Simplilearn. (2019, December 10). Big data in 5 minutes | what is big data?| introduction to big data |Big Data explained |Simplilearn. YouTube. Retrieved January 21, 2022, from https://www.youtube.com/watch?v=bAyrObl7TYE
SREB. (2015, December 10). Data definitions. Southern Regional Education Board. Retrieved January 21, 2022, from https://www.sreb.org/data-definitions
SREB, (2018). 10 Issues in Educational Technology, 2108. Retrieved December 7, 2019 from https://www.sreb.org/sites/main/files/file-attachments/10issues_v8-web_version_accessible.pdf?1521568731
What is an exabyte? Teradata. (n.d.). Retrieved January 21, 2022, from https://www.teradata.com/Glossary/What-is-an-Exabyte
Comments