Skip to main content

Uncovering insights from big data: change point detection of classroom engagement

Abstract

Expectations of big data across various fields, including education, are increasing. However, uncovering valuable insights from big data is like locating a needle in a haystack, and it is difficult for teachers to use educational big data on their own. This study aimed to understand changes in student participation rates during classes and teachers’ teaching styles by analyzing educational big data. In the analysis, data from 120 students and two mathematics class teachers at a public junior high school in Japan were used. We applied the pruned exact linear time (PELT) algorithm to automatically identify significant changes in student participation during class. Based on the information obtained, we analyzed the interaction logs of teachers’ e-book readers and clarified the relationship between student participation rates and teacher behavior patterns. Change point detection using the PELT algorithm showed a high F1-score of 0.7929, indicating good overall performance. We also investigated whether there was a relationship between class differences and teachers’ actions and found a statistically significant difference. The results provide clues for improving student learning engagement and teachers’ teaching styles, and they are expected to improve the quality of education by automatically identifying notable cases from educational big data. However, further research is required to improve data analysis methods, such as adjusting the parameters of algorithms based on the situation.

Introduction

The use of big data in education has attracted considerable attention recently (Daniel, 2015; Fischer et al., 2020; Stojanov & Daniel, 2023). Educational big data has the potential to provide new evidence-based research into learning and education (Gibson, 2017). Big data analysis is expected to provide deep insights into improving the efficiency of the educational process and student learning outcomes (Mor et al., 2015). For example, using educational big data enables the extraction of knowledge, such as student learning patterns and trends, for teachers to gain a new understanding (Ndukwe & Daniel, 2020). Big data, therefore, has the capacity to play a vital role in education.

Big data has illuminated previously unknown aspects of classroom environments owing to diverse sensing technologies. By incorporating elements of multimodal analysis, audio, video, and interactive log data are further leveraged to uncover intricate patterns of student engagement and interaction (e.g., Ahuja et al., 2019; Sümer et al., 2021). This comprehensive approach not only enriches understanding of the learning process but also paves the way for innovative educational strategies tailored to diverse learning needs. While sensing technologies offer the potential for gathering extensive classroom data, it may be challenging for their effective integration to reflect the diversity of teaching methods and learning content within educational settings (Prieto et al., 2017). Thus, there has been a growing interest in harnessing the insights available from log data. For instance, the log data generated by e-book readers and Learning Management Systems (LMS) would contain important information. These tools not only serve as alternatives to traditional textbooks and instruction but also have the potential to integrate sensing devices into the learning process seamlessly. In other words, teachers can capture the classroom simply by using digital tools without having to install special equipment.

The interaction log data output from e-book readers is a primary source of educational big data. Interaction log data record technology-mediated interactions between students and teachers for learning activities (Kim et al., 2019), and e-book readers are increasingly used in educational settings as an alternative to paper textbooks because of their portability and interactive tools (Boticki et al., 2019). Therefore, the log data output from e-book readers is so large that it is referred to as big data (Ogata et al., 2017). Analyzing log data from e-book readers enables obtaining new knowledge and learning patterns that have not yet been revealed (Mouri & Yin, 2017). One of the benefits of such an approach is to demystify how teachers’ instruction affected students’ use of digital tools. Even when teachers and students learn in the same classroom, it is difficult for teachers to grasp that students have used the digital tools as expected. Exploring students’ interaction logs is thus worthwhile for teachers to reflect on how their instruction reached students.

However, extracting meaningful information from big data resembles searching for a needle in the haystack (Neuman et al., 2019). It is difficult to efficiently screen large amounts of data to obtain valuable insights (Lin et al., 2020; Mian & Ronson, 2019). Change point detection is among the methods used to address this challenge. For example, Boubaker et al. (2021) used change point detection to analyze the correlation between stock prices and news in the financial domain. In education, change point detection has been employed to analyze educational big data; however, this approach is underutilized (Park et al., 2017; Shimada et al., 2018).

Therefore, in this study, we investigated a method for automatically extracting classes in which the participation rate during classes significantly changed for the sake of teachers’ reflections. The participation rate in class is an indicator of the level of participation of the whole class and affects individual student engagement (Froiland & Worrell, 2016; Wang et al., 2018). Teachers can learn about learners’ interests and motivations through changes in participation rates, which can encourage self-reflection and improve instruction (Lee & Reeve, 2012). Traditionally, researchers have not used log data but instead used questionnaires (Archambault et al., 2013; Wang et al., 2014) or direct observations of hand-raising and student behavior (Froiland & Oros, 2014). Thus, participation rates during class were monitored subjectively. Therefore, considerable research is required to compare multiple classes. In this study, we investigated a method for automatically extracting classes that teachers should review using educational big data generated from an e-book reader. We also explored the relationship between instruction and participation rates by comparing classes that should be revised with other classes.

Literature review

Educational big data from e-book log data

Electronic books (e-books) have become popular learning tools in educational settings (Boticki et al., 2019). Interactional logs using e-book readers have accumulated as large-scale educational big data (Ogata et al., 2017). These big data can help improve and optimize educational practices (Mouri & Yin, 2017). For example, based on an analysis of teaching records collected from e-book records, Zhao et al. (2021) reported that learning improved when teachers emphasized essential points before the activity. Ma et al. (2022) conducted a study focusing on the jump-back phenomenon of e-book readers and suggested that this knowledge could be used for instructional improvement and lesson design. Analysis of educational big data collected from e-book readers is expected to improve education.

Engagement is available as an indicator in the learning analytics field used as an e-book reader. Engagement is an important measure of the effort and active participation learners put into a particular learning environment (Dixson, 2015). Therefore, the participation rate in learning activities is an indicator of engagement (Wang et al., 2018). Using this metric for data analysis, teachers can receive valuable feedback on student learning behaviors. For example, Kuromiya et al. (2022) used engagement indicators from e-book reader log data to investigate learner motivation during the COVID-19 pandemic. In the literature, engagement indicators are attracting attention as keywords for analyzing educational big data using e-book readers (Chen et al., 2023).

Classroom engagement and participation rates

Participation rate in learning activities is an indicator to measure classroom students’ engagement (Subramainan & Mahmoud, 2020). Research on participation rates is sometimes calculated through surveys based on teacher observations. For instance, Archambault et al. (2013) quantified active classroom participation through a 10-item survey. Similarly, Wang et al. (2014) viewed classroom engagement as participation in learning activities, conducting a survey from four aspects. Although data collection based on such questionnaire surveys is more accessible to collect and interpret, it is subjective and carries the risk of bias.

Against this background, some studies have measured objective participation rates through sensing in learning activities. For example, Xie et al. (2023) proposed a method using image recognition technology to measure class participation rates from students’ gestures during online learning and provide feedback to teachers. Duggal et al. (2021) incorporated gamification into learning, measuring classroom participation by observing changes in corresponding coins for learning activities. Instead of introducing sensing different from normal learning activities to understand the participation rate, there is also a method that uses log data from devices used for learning, such as LMS. Avci and Ergun (2022) used LMS interactional log data to identify students’ participation levels in an analysis of students’ LMS activities in online learning environments on student engagement, information literacy, and academic performance. In this study, we captured classroom engagement through the number of e-book reader interaction log data. Interaction log data directly records the operations of specific devices by students and teachers, making it possible to collect objective data. Additionally, interaction log data can be collected using ICT tools without special technology. In other words, collecting interaction log data enables a more naturalistic observation of classroom dynamics and preserves the integrity of the educational environment.

The importance of change point detection in big data

Mean and variance are the most common way to understand changes in classroom engagement from educational big data. However, as statistical indicators, mean and variance can capture only broad changes, such as those occurring over large intervals. Therefore, change point detection is necessary to focus on processes and find notable changes in the classroom that occur over time. Change point detection methods are divided into online and offline methods. Online change point detection is used in tasks for which changes are detected with immediate feedback, whereas offline change point detection is used analytically after data collection (Truong et al., 2020). Offline change point detection is often used when analyzing time series data analytically and is sometimes referred to as signal segmentation (Jackson et al., 2005). In this study, we performed offline change point detection to use change point detection analytically.

In offline change point detection tasks, the observed data are divided into segments at certain points, and the presence or absence of changes is verified (Jackson et al., 2005; Park et al., 2017). Therefore, an appropriate algorithm must be selected based on whether the presence or absence of a change is known in advance. In traditional signal segment analysis, hypotheses based on prior knowledge from experts or analysts are used to predefine the number of changes. However, in detecting change points from large-scale big data, the complexity of the data often means that such prior knowledge is not available, making it challenging to grasp the number of changes in advance. The pruned exact linear time (PELT) algorithm can be used, especially when the presence or absence of change is unclear in advance (Killick et al., 2012). PELT may be applied without predetermining change points, addressing the challenges of analyzing complex data without prior knowledge. As changes in educational big data are rarely known in advance, we employed the PELT algorithm. The PELT can be incorporated into big data analysis to support finding the necessary information from big data for teachers’ reflection without the tedious task of inputting the teacher whether there are any changes each time.

Pruned exact linear time algorithm for multiple change point detection

The problem of change point detection is simply the optimal calculation of the following equation, where C represents a cost function, such as a negative log-likelihood function, and \({\varvec{\upbeta}}\left(\mathbf{m}\right)\) represents a penalty function to prevent overfitting:

$$\sum_{{\varvec{i}}=1}^{{\varvec{m}}+1}\left[\mathbf{C}\left({\mathcal{Y}}_{\left({{\varvec{\uptau}}}_{\mathbf{i}-1}+1:{{\varvec{\uptau}}}_{\mathbf{i}}\right)}\right)\right]+{\varvec{\upbeta}}\left(\mathbf{m}\right)$$

A simple way to solve the problem of change point detection is to calculate and evaluate all possible combinations of change points. However, if all solutions are tried, the amount of computation increases exponentially. Therefore, it is difficult to try all possible solutions for big data collected from the real world.

Efficient algorithms have been proposed to minimize the cost function more feasibly, among them PELT, binary segmentation, and segment neighborhoods (Killick & Eckley, 2014; Killick et al., 2012).

Binary segmentation starts by applying a single change point test across the entire dataset. If a change point is detected, the dataset is split into two segments at this location. This procedure is recursively applied to each resulting segment until no further changepoints are found (e.g., Edwards & Cavalli-Sforza, 1965; Scott & Knott, 1974; Sen & Srivastava, 1975). This method is considered to be approximate because the detection of subsequent change points is conditional on the changepoints identified in previous steps (Killick & Eckley, 2014; Killick et al., 2012).

Segment neighborhood is an exact method employing dynamic programming to systematically explore all possible segmentations for a given maximum number of change points, denoted by Q. This method precisely minimizes the cost function by reusing calculations from previous steps, reducing the computational burden from \({\varvec{O}}\left({2}^{{\varvec{n}}}\right)\) to \({\varvec{O}}\left({{\varvec{Q}}{\varvec{n}}}^{2}\right)\) (e.g., Auger & Lawrence, 1989; Bai & Perron, 1998). However, this comes at the cost of increased computational complexity compared to binary segmentation, making it slower but more accurate in identifying change points (Killick & Eckley, 2014; Killick et al., 2012).

Binary segmentation, while fast, compromises accuracy owing to its dependency on prior segmentations, whereas segment neighborhood offers higher precision but requires previous knowledge of the number of changepoints. PELT has a calculation method similar to the commonly used change point detection but with faster computational efficiency (Li & Diao, 2023). This is because PELT uses dynamic programming and a pruning strategy, enhancing efficiency and accuracy in detecting multiple changepoints. Specifically, pruning reduces the computation required by avoiding segmentation evaluations that do not lead to an optimal solution instead of searching for all potential options, like the neighborhood approach. Additionally, the PELT algorithm can handle an unknown number of change points without assuming any changes in advance (Killick & Eckley, 2014). This characteristic is helpful for big data, which cannot predict the occurrence and frequency of changes in advance. Hence, the PELT algorithm has been applied to financial and oceanographic data (e.g., Killick & Eckley, 2014), climate change data (e.g., Wang & Fan, 2021), and transportation changes (e.g., Liu et al., 2021). In this study, we implemented the PELT algorithm using the Python library ruptures (Truong et al., 2020).

Significance of this study and research questions

Classroom e-book readers have generated and accumulated extensive interactional log data between students and teachers daily. This burgeoning educational big data presents an unprecedented opportunity to quantitatively assess student engagement during class sessions. To navigate this opportunity, our study employs an engagement metric, measured as participation rates, alongside the novel application of change point detection techniques. Among these, the PELT algorithm is particularly noteworthy for its efficacy in identifying fluctuations in student engagement levels to enhance educational practices.

To accommodate the comprehensive review of existing literature and refine the focus of our inquiry, we introduce our research in this section. This section serves a dual purpose: summarizing the prior research on the above and setting the research questions guiding our research. Specifically, we seek to address the following queries in this study:

RQ1: How can change point detection algorithms be applied to analyze educational big data?

RQ2: What type of instruction did teachers provide in classes where participation rates changed?

By positing these questions, we aim to elucidate the potential of big data analytics in monitoring student engagement and informing instructional strategies. Our methodology, centered around the PELT algorithm, enables the detection of notable changes in student participation rates. This capability is not just academic; it has practical implications for teachers seeking to adapt their teaching approaches in real-world classrooms, enhancing the educational experience. The use of change point detection in this context offers actionable insights, paving the way for targeted interventions that can dynamically improve teaching quality and student learning outcomes.

Method

E-book reader as a behavior sensor and participation rates

In this research, we used BookRoll, an e-book reader. BookRoll is a sensor that collects data about learning behavior, and student and teacher activities are saved as logs (Majumdar et al., 2021). In BookRoll, learning activities occur when students and teachers access teaching materials uploaded in advance (Kuromiya et al., 2022). Each log is saved in the database with a label, such as “Memo,” “HW_MEMO,” “Recommend,” “NEXT,” “PREV,” and “PAGE JUMP,” depending on the operation (Fig. 1). Logs collected by BookRoll, which acts as a learning behavior sensor, are stored in the Learning Record Store (LRS) database and can be analyzed as big data (Flanagan et al., 2018).

Fig. 1
figure 1

Example Behaviors on BookRoll

In this study, we defined the participation rate during class using log data of students who performed various operations using BookRoll during class. Specifically, we calculated the number of users who performed operations per minute during class and analyzed changes in the participation rate per minute. A length of 1 min was used as the minimum unit for analyzing lessons in previous studies using log data (e.g., Lim et al., 2023; Shimada et al., 2018), and thus, this study used the same length of time. We analyzed the number of users who interacted with the site every minute as a basic indicator of engagement. Importantly, in time series analysis, instead of directly using timestamps, setting a time width is a common practice to reduce the impact of missing values. This approach allows for a more robust analysis by smoothing over gaps in the data and ensuring that short-term fluctuations do not unduly influence the analysis outcomes.

Target school context and data collection

The data analyzed in this study were obtained from 120 students and two teachers who accessed the e-book during the target period from three first-year math classes at a public junior high school in Japan. The target students were aged between 12 and 13 years. At the experimental school, teachers held face-to-face classes for each subject, and learning progressed in a typical Japanese junior high school class style. In addition, each student and teacher were provided with a tablet device. Therefore, at the school where we collected these data, tablets were brought into the classroom when attending classes, and their use was left to each student and teacher. In addition, interactional log data were generated by the users operating each tablet. The generated interaction logs were stored in the LRS database (Ogata et al., 2023). The following subsection describes how we extracted the data to answer the research questions.

Parsing process

As a part of classroom engagement, we measured participation rates in learning content using data related to e-books stored in the LRS. First, the total number of logs for 1 year of the students targeted for analysis was 1,396,225. This log also included a log of activities outside class hours. Therefore, we extracted log data only for the period corresponding to school class hours. Furthermore, e-book access varied depending on the learning activities. Specifically, the number of students logged in during one class varied from 1 to 40. We considered that the importance of tablets and e-book readers in learning activities would differ between a class in which only one student used BookRoll and a class in which 40 students used it. To align the influence of learning activities as much as possible, we analyzed classes in which 35 or more students logged in during the class and used 300,799 logs (105 classes) as data for analysis 1. Next, we excluded 10 classes to analyze the teacher log data. No teacher logs remained in the 10 classes that were removed. In Analysis 2, we analyzed 95 classes, 270,486 student logs, and 2,516 teacher logs (Fig. 2).

Fig. 2
figure 2

Criteria for Data Extraction and the Number of Analyzed Logs

Analysis 1

In Analysis 1, we used the number of active students accounts per minute as the participation rate to determine the degree of student classroom engagement during class. We adopted the PELT algorithm to detect significant changes in student participation rates during classes.

Adjusting parameters

The PELT algorithm allows users to set a cost function and penalty based on observed data. In this study, we adopted a kernel-based method as a cost function that can also be used with nonparametric observational data based on previous studies (Arlot et al., 2019; Celisse et al., 2018). We also set the minimum window for changes to occur as 5 min. Therefore, up to 10 notable changes can occur in a 50-min class. However, there is currently no method for a straightforward way to determine the optimal number of penalties other than through experimental adjustments (Haynes et al., 2017a). Therefore, we narrowed down the penalty candidates using the elbow method proposed by Haynes et al. (2017b). Specifically, we calculated the average change score and average cost function for 105 classes for each penalty. We then plotted the average change points on the x-axis and the average cost function on the y-axis (Fig. 3). Finally, the candidates were narrowed down to inflection points, where the rate of change in the number of detected change points decreased significantly. In this study, we selected 1.5, 2.0, 2.5, and 3.0 as candidates for the penalty value based on Fig. 3.

Fig. 3
figure 3

Elbow Plot for Analysis 1

Creating and evaluating correct answer labels

A class was labeled with “change” if at least one change point was detected. Otherwise, it was labeled with “no change.” First, three independent annotators classified the classes. When annotators’ judgments were evaluated, Fleiss’s kappa coefficient was 0.6800, which was considered a close agreement (Landis & Koch, 1977). Therefore, majority voting was used to establish the final label for each class. We then performed a similar labeling based on the results of applying the PELT algorithm. We then analyzed the evaluation of the human annotator and the results of the PELT algorithm using a confusion matrix.

Measurement by Confusion Matrix

Confusion matrix is commonly used in binary classification problems. This matrix helps compare the algorithm’s predictions and the human annotator’s classification. In Table 1, the rows represent the labels predicted by the algorithm, whereas the columns represent the labels classified by the human annotator. In this matrix, true positives (TP) represent the number of classes labeled as having changed by both the annotator and the algorithm. True negatives (TN) represent a class labeled as having no change by both the annotator and the algorithm.

Table 1 About Confusion Matrix and Evaluation Metrics

Meanwhile, false positives (FP) represent the number of classes the annotator labeled as having no change but the algorithm predicted as having a change. False negatives (FN) represent the number of classes the annotator labeled as having a change, but the algorithm predicted as having no change. In Analysis 1, we used the evaluation metrics shown in Table 1 to assess the closeness of the algorithm’s predictions to those of the annotator. We evaluated the accuracy rate, precision rate (true positive rate), recall rate, actual negative rate, false negative rate, false positive rate, and F1 score.

Analysis 2

In Analysis 2, we analyzed the differences between classes that had changed and classes that had not changed based on teachers’ use of the e-book reader BookRoll. In doing so, we focused on the actions of the e-book readers used by teachers and investigated the relationship between these actions and changes in engagement during class.

Data selection and preprocessing

Of the 105 classes examined in Analysis 1, 95 were targeted for Analysis 2, after excluding 10 classes for which no teacher-log data remained. For the 95 classes, there were 270,486 logs for students and 2,516 logs for teachers. In Analysis 1, only the results of running the PELT algorithm with the optimal parameters and the three annotators’ evaluations of changes as having changed were classified as “classes with changes.” The other classes were treated as “classes without change.”

Finally, the 95 classes were analyzed by dividing them into “classes with changes” (N=50) and “classes without changes” (N = 45) (Figure 4). We aimed to ensure the quality and reliability of the analysis by setting only the algorithmic evaluation and human evaluation as “classes with changes.”

Fig. 4
figure 4

Analysis Design for Analysis 2

Applying statistical methods

The focus of Analysis 2 was to gain insight into the characteristics of teachers’ instruction in “classes with changes” and “classes without changes”, in which the algorithms and human evaluations matched. We aimed to verify whether there were significant differences in teachers’ actions on e-book readers between “classes with changes” and “classes without changes.” First, we counted the frequency of each action performed within each class’s e-books and created a crosstab. We then classified the behavior of e-book readers as characteristic of “classes with changes” and the behavior of the “classes without changes.” We then conducted Fisher’s exact test to examine whether the occurrence rate of characteristic behaviors differed between “classes with changes” and “classes without changes.”

Results and interpretation

RQ1: How can change point detection algorithms be applied to analyze educational big data?

This subsection presents the results of the labels obtained from the PELT algorithm and the labels obtained from annotator evaluation. Table 2 presents the results of the evaluation of the algorithm using different penalty values narrowed down in advance. The table illustrates the assessment of each model’s performance using various metrics (accuracy rate, precision rate, recall rate, true negative rate, false negative rate, false positive rate, and F1-score).

Table 2 PELT Algorithm Execution Results Based on Candidate Penalty Values

As shown in Table 2, when the penalty was set to 3.0, 2.5, and 2.0, the accuracy, recall, and F1-score performance decreased compared to the model with a penalty of 1.5. Models with a penalty of 3.0 had low recall and a relatively high false negative rate, indicating a limited ability to capture positive samples adequately.

By contrast, the relatively accurate model with a penalty of 1.5 had correct answer rates of 0.6667 and 0.6700, confirming that the accuracy of optimistic predictions was high. Furthermore, the recall rate was 0.9710, which detected almost all positive samples. The actual negative rate was low at 0.0833, and the false negative rate was meager at 0.0290, suggesting that the probability of missing a positive sample was very low. However, although the false positive rate was somewhat high at 0.9167, the F1-score, which indicates the overall performance, was high at 0.7929.

Figure 5 shows an example of the class data when the penalty was set to 1.5. In the figure, time is shown on the x-axis, and the participation rate is plotted on the y-axis. The areas indicating separated segments are shaded pink and blue, with the segment boundaries delineated by thick red lines. At a penalty of 1.5, the model correctly identified 67 true-positive cases (Fig. 5a) and three true-negative cases (Fig. 5b). In addition, the number of false positives was 33 (Fig. 5c), and the number of false negatives was 2 (Fig. 5d). Regarding the false-positive rate, we confirmed that when a penalty of 1.5 was adopted, the model could detect positive samples and accurately identify true negatives from class data with more than 35 participants. Therefore, in this study, we used PELT with a penalty of 1.5 and conducted Analysis 2 on classes classified with changes and classes without changes.

Fig. 5
figure 5

Example of change point detection using the PELT Algorithm

RQ2: What type of instruction did teachers provide in classes where participation rates changed?

In Analysis 2, we investigated teacher behavior and found 95 classes (2516 logs). We then compared the 50 classes determined to be “changed” by annotators and PELT algorithms with the other 45 classes. For comparison, we performed a cross-tabulation of the dataset between the classes that the algorithm and annotators determined had changed and other classes. Table 3 presents the cross-tabulation of the results.

Table 3 Cross-tabulation of E-Book Actions with and without Changes

The results obtained from the cross-tabulations presented in Table 3 provide insights into e-book manipulation in the educational context. First, we report eight types of behavior observed only in classes that had changed. The ADD_MEMO action was observed in 11 classes. The ADD_HW_MEMO action was observed in 12 classes. CHANGE_MEMO, DELETE _MEMO, and OPEN_RECOMMENDATION were observed in three classes. CLOSE_RECOMMENDATION, TIMER_PAUSE, and UNDO_HW_MEMO actions were observed in two classes.

Six types of behavior were commonly observed in the modified and unmodified classes. For the CLOSE action, 30 classes with changes and 24 classes without changes were observed. NEXT actions were observed in 33 classes with or without changes. The OPEN action was observed in 49 classes with changes and in 42 classes without changes. PAGE_JUMP actions were observed in 10 classes with and 8 classes without changes. The PREV actions were observed in 23 classes with changes and 27 classes without changes. REGISTER_CONTENTS actions were observed in three classes with changes and four classes without changes.

Next, we investigated the relationship between “common behaviors” and “uncommon behaviors” shown in the results above. We conducted Fisher’s exact test to determine whether there was a relationship between cases in which there was a change in behavior in the classroom and cases in which there was no change. The result showed a p-value of 0.0001226, indicating a statistically significant relationship between class and behavior type. Specifically, the characteristic behavior was observed 13 times in the “classes with changes,” whereas the typical behavior was not observed in the “classes without changes.” This suggests that the observed data differed depending on whether changes occurred during a class.

Discussion

Key findings

This study aimed to automatically identify changes in participation rates during classes from educational big data and gain insight into teachers’ teaching styles. Analysis 1 used the PELT algorithm to identify changes in student participation during class. In Analysis 2, we investigated the relationship between the characteristics of teachers’ operating behavior on e-book readers and the class type.

Optimizing the PELT algorithm for educational big data analysis (RQ1)

In Analysis 1, we effectively detected changes in student participation rates during class by applying the PELT algorithm. This approach presents new possibilities for extensive data analyses in education. Furthermore, it was confirmed that the false positive rate in change point detection was relatively high. Regarding the PELT algorithm, more research is required to set the optimal penalty and determine the cost function (Truong et al., 2020). Therefore, when performing an analysis, domain knowledge must be used to optimize the parameters of the required information (Truong et al., 2017). In this study, we set a small penalty to extract classes with change. However, it may be necessary to impose harsh penalties to extract classes with no change. These results can be used to extract change points from educational big data.

Linking teacher behaviors and change in classroom engagement (RQ2)

In addition, in Analysis 2, we designated the classes for which the algorithm and human annotators agreed were modified as “classes with changes.” Other classes were labeled as “classes without changes,” and changes in teacher behavior patterns were analyzed. As a result, the difference between “classes with changes” and “classes without changes” was determined. In classes with significant changes in student participation rates, we observed increased interactions within certain e-books, such as ADD_MEMO. This result suggests that teachers’ use of technology tools may support student engagement. Teachers utilize technology to increase student engagement (Gebre et al., 2014). However, keeping students engaged with technology requires effective instructional practices (Henrie et al., 2015). Our research method revealed which features from educational big data, such as log data, can increase student engagement in the classroom. Teachers can control students’ classroom engagement in learning activities by further analyzing operational patterns at the detailed action level.

Automatic extraction of datasets for teachers’ reflections from big educational data

Teachers need to reflect on their past actions in class to think about their future actions (Olteanu, 2017). High-quality education can be achieved by supporting teachers to reflect on their daily classes (Ndukwe & Daniel, 2020). To achieve this objective, teachers must find the relevant information accurately and quickly (Van Leeuwen et al., 2019). Educational big data can expand teachers’ opportunities for teachers’ reflection and offer valuable insights to enhance the quality of education. However, it can also create stress for teachers as they decide which past classes to review. Using the PELT algorithm, it is possible to extract classes with abnormal participation rates from educational big data automatically. This supports teachers in identifying classes where participation rates have changed from educational big data and increases teachers’ reflection opportunities. In addition, comparing teachers’ activities in “classes with changes” and “classes without changes” yields insights into learning design for the next class. In other words, this research method can contribute to automatically extracting meaningful datasets for teachers, akin to finding a needle in a haystack from the educational big data of interactional logs generated daily in a digital environment.

Overall, our results provide a practical approach that will help teachers improve their classes by utilizing educational big data. For example, by analyzing changes in participation rates during classes identified by the PELT algorithm, teachers can develop specific strategies to improve student engagement. Furthermore, the results of Analysis 2 pave the way for a detailed understanding of the usage patterns of e-readers in educational settings and for utilizing this information to improve the educational process. In particular, the increase in specific interactions within e-books found in classes with varying levels of student participation will guide teachers to utilize these technologies effectively.

Ethical considerations regarding the use of log data

An essential aspect of using log data, such as log data from the e-book reader, is ensuring ethical handling and interpretation (Romero & Ventura, 2020). The e-book reader collects a wide range of data, including teacher-student interactions, using various educational tools. Although these data are invaluable for educational insight, they raise concerns regarding privacy and ethical use. We ensured that all data in this study adhered to strict confidentiality and privacy guidelines.

Furthermore, because of the nature of log data, there is still debate regarding the interpretation of the results obtained (Romero & Romero, 2014). Therefore, the analytical results require careful discussion. Our research used PELT as an algorithm to automatically identify classes for teachers’ reflections from educational big data. We also discussed the utility of PELT as an approach that can be applied across educational big data.

However, the process of individualized instruction requires careful discussion; for example, accessing the system does not necessarily mean that the teacher is in the classroom at that time. Therefore, our analysis focused on broader patterns and trends rather than individual profiling.

Limitations and future work

This study is an empirical study of an analysis method that uses educational big data, but it has some limitations that must be carefully considered. The data were limited to specific schools and classes, making it difficult to generalize the results to other educational settings. In addition, because the data analysis is based on log data, it does not provide information on specific teacher roles, locations, or other circumstances. Furthermore, the high false-positive rate of the PELT algorithm may limit the scope of data analysis. The limited size and diversity of the datasets might have also influenced the effectiveness of the insights generated by the PELT algorithm. Future research should collect and analyze data from various schools and teaching styles to address these challenges.

Despite these limitations, this study highlights the importance of big data applications in the education sector. Learning processes can be discovered, monitored, and evaluated through big data analytics, thereby yielding new insights (Fischer et al., 2020). In addition, by understanding changes in classes using big data, teachers’ daily reviews are expected to become more efficient. In general, teachers are known to make decisions intuitively, even though they are aware of the effectiveness of data utilization (e.g., Vanlommel et al., 2017). Therefore, systems that support data utilization are required (Van Leeuwen et al., 2019). Using the PELT algorithm to identify where to look back from big data, it may be possible to support teachers in using the data. Future research should verify the usefulness of the system by showing teachers classes in which changes have occurred. In addition, teachers guide students by providing various instructions depending on their situation (Van Leeuwen & Janssen, 2019). Therefore, it is meaningful to analyze student behavior and teachers’ instructions during classroom changes. In future research, we aim to develop methods to analyze students’ and teacher’s behavior in changed classrooms.

Conclusion

This study proposed an effective method for using big data in education and demonstrated its potential to improve the quality of classes. The results of change point detection using the PELT algorithm provide concrete, vital insights to help teachers improve their teaching methods. These insights are expected to contribute to developing more effective educational approaches through application in educational settings. However, it is crucial to note that the effectiveness claimed might be overestimated without considering the constraints posed by the size and diversity of the datasets used. The findings highlight the need for comprehensive testing across varied educational settings to confirm these results. These efforts are vital to fully assess the potential and limitations of change point detection methods in educational big data analytics. The ongoing development of these analytical techniques is expected to significantly advance educational practices by providing more accurate and reliable methods of interpreting big data. Ultimately, this research underscores the transformative potential of big data analytics in education, paving the way for more personalized and effective teaching methodologies.

Availability of data and materials

The data of this study are not open to the public owing to participant privacy.

Abbreviations

PELT:

Pruned exact linear time

LRS:

Learning record store

TP:

True positive

TN:

False negative

FP:

False positive

TN:

True negative

References

Download references

Acknowledgements

The authors acknowledge the cooperation of the participating Japanese junior high and high schools, the instructors who used the BookRoll in their daily classroom activities, and all others who contributed to this research.

Funding

This study was supported by JSPS KAKENHI JP22K20246 and JP23H00505.

Author information

Authors and Affiliations

Authors

Contributions

Nakamura designed the overall research, analyzed data, and drafted the manuscript. Horikoshi made significant contributions to the research question, supervised all aspects of the research and manuscript, and provided guidance throughout the process. Manabu Ishihara reviewed the research design, contributing to the overall planning and flow of this paper. Ogata, as the PI of the Let research unit, oversaw all the research conducted with the LEAF platform, including this study, and reviewed the research design and manuscript. He also provided supervision for this research paper.

Corresponding authors

Correspondence to Kohei Nakamura or Izumi Horikoshi.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakamura, K., Ishihara, M., Horikoshi, I. et al. Uncovering insights from big data: change point detection of classroom engagement. Smart Learn. Environ. 11, 31 (2024). https://doi.org/10.1186/s40561-024-00317-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40561-024-00317-6

Keywords