Predicting students’ flow experience through behavior data in gamified educational systems

The flow experience (i.e., challenge-skill balance, action-awareness merging, clear goals, unambiguous feedback, concentration, sense of control, loss of self-consciousness, transformation of time, and autotelic experience) is an experience highly related to the learning experience. One of the current challenges is to identify whether students are managing to achieve this experience in educational systems. The methods currently used to identify students’ flow experience are based on self-reports or equipment (e.g., eye trackers or electroencephalograms). The main problem with these methods is the high cost of the equipment and the impossibility of applying them massively. To address this challenge, we used behavior data logs produced by students during the use of a gamified educational system to predict the students’ flow experience. Through a data-driven study (N = 23) using structural equation modeling, we identified possibilities to predict the students’ flow experience through the speed of students’ actions. With this initial study, we advance the literature, especially contributing to the field of student experience analysis, by bringing insights showing how to step towards automatic students’ flow experience identification in gamified educational systems.

This article is an extension of the conference paper conducted by . Previous studies of this project have been published: Oliveira et al. (2018) conducted a systematic literature review about Flow Theory and Educational Technologies; Oliveira (2019) presented the project overview; Oliveira et al. (2019) proposed a theoretical model relating students' data logs and their flow experience in educational systems;  conducted a qualitative study analyzing students' data logs and their flow experience in an educational system; and  conducted a data-driven study modeling students' flow experience based on their data logs in a gamified educational system.
Considering the relevance of students' flow experience in online learning environments, there is growing interest in investigating the effects of different approaches, such as virtual reality , game-based learning (Chan et al., 2021), and gamification , on students' flow experience in the educational context. Specially, in the gamification field, previous studies have pointed out that gamified scenarios can lead learners to experience flow (Kocadere & Çağlar, 2015;Sillaots, 2014;Zhao & Li, 2020) and flow experience in gamified learning environment had a highly significant impact on motivation, and increased students' academic success (Özhan & Kocadere, 2020).
However, measuring flow experience in online learning systems (e.g., gamified educational systems) is still a challenge (Jackson & Marsh, 1996;Hamari & Koivisto, 2014;Lee et al., 2014;Oliveira et al., 2019;Semerci & Goularas, 2020). The flow experience measurement is challenging because among the instruments more adopted to measure flow experience in educational settings are questionnaires, observations, interviews, focus groups, eye trackers and electroencephalogram (EEG) (Perttula et al., 2017;Oliveira et al., 2018. Nonetheless, these methods present limitations related to high cost, remove students from the activity and the impossibility of conducting a massive application (Oliveira et al., 2018Oliveira, 2019).
At the same time, while the methods currently used have several limitations (Perttula et al., 2017;Oliveira et al., 2018, the increasing use of online education systems generate more and more behavior data logs (i.e., interaction data that represents the behavior of students when using an educational system) (Eberle & Hobrecht, 2021;Chaku et al., 2021). Thus, these data logs (e.g., number of mouse clicks, time taken to complete a given activity, an average of correct activities, among others) generate the possibility of predicting different user experiences through possible demonstrated behavioral patterns through the data (Lee et al., 2014;Oliveira, 2019).
To meet the challenge of students' flow experience identification in educational systems, we conducted a study (N = 23) aiming to predict the students' flow experience based on the behavior of these students during the system usage and answer the following research question: what students' behaviors can be used to predict their flow experience in a gamified educational system? The main results show a promising use of students' behavior (represented by data logs), mainly related to the use of speed of students' action to predict their flow experience. Consequently, this study advances state-of-the-art and contributes to future studies related to prediction and automatic measurement of students' flow experiences in gamified learning systems.
This article is structured as follows: 'Background' section provides a background of flow theory in education and gamification in education fields. Besides it, 'Background' section also presented related works that investigated data logs to measure flow experience. 'Study design' section depicts the study design, and 'Result' section describes the results obtained after the study's conduction and discusses the main results found. Finally, in 'Concluding remarks' section, the concluding remarks of this work are depicted.

Background
In this section, we present the main topics addressed in this article (i.e., Flow Theory in education and gamified education). We also present the main related works. Csikszentmihalyi and Csikszentmihalyi (1975) first introduced the concept of flow to define an "optimal experience", which is an experience where individuals get into an optimal state during a certain activity while the mind becomes effortlessly focused and engaged (Csikszentmihalyi & Csikszentmihalyi, 1975). According to Trevino and Webster (Trevino & Webster, 1992), flow experience is a significant element in understanding human-technology interactions. Based on it, Flow Theory has been extensively explored in different computer-based contexts, including educational technology contexts (Zhao et al., 2021).
In the educational context, during the learning process, flow occurs as a feeling of pleasure that translates into achieving realistic goals and overcoming prescribed challenges (Csikszentmihalyi, 1990). The literature points out that the flow state is particularly recurrent in the context of learning, and educational settings benefit from the state of flow (Cesari et al., 2021). For example, Hamari et al. (2016) showed that flow and engagement had a positive association with student learning in the game-based learning context. At the same time, results achieved by Klein et al. (2010) also show that flow affected student perceived learning and student satisfaction.
Over the years, different instruments were developed and adopted to measure flow experience (Perttula et al., 2017;Oliveira et al., 2018). According to a secondary study conducted by Oliveira et al. Oliveira et al. (2018), in the educational learning context, the questionnaire is the most used instrument to measure student flow experience during the learning process in computer-based learning activities. Moreover, although in a small proportion, user data logs, interviews, and recording of user's faces are also instruments that have been investigated by researchers in the educational technology context (Oliveira et al., 2018). Among these mainly adopted methods, student data logs (representing the students' behavior in the system) have shown promising results to detect student flow experience in online learning environments (Lee et al., 2014;Oliveira et al., 2019Semerci & Goularas, 2020;.

Gamified education
Gamification represents the idea of "transforming systems, service, and activities to better afford similar motivational benefits as games often do" (Hamari, 2019). According to Koivisto and Hamari (2019), information systems that apply the gamification technique aim to afford similar experiences and motivations as games do, and consequently, attempt to affect user behavior. This technique has been successfully adopted in the last decade and has been applied in different fields, such as marketing, commerce, health, and education (Koivisto & Hamari, 2019;Klock et al., 2020;Zainuddin et al., 2020;Kalogiannakis et al., 2021).
In the online learning field, there is growing evidence suggesting that gamification is an effective learning tool (Zainuddin et al., 2020;Sailer & Homner, 2020). Researchers and practitioners are successfully applying gamification in online learning environments to overcome the challenges these systems face, such as lack of student motivation, engagement, and low student performance (Tenório et al., 2016;Lopez & Tucker, 2019;Groening & Binnewies, 2019). For example, Tenório et al. (2016) show that gamification positively affected students' outcomes, increasing the number of their accesses and registering a higher percentage of activities performed and corrected in a gamified online system.
These positive effects of gamification on students' outcomes in online learning environments could be related to the positive impact of gamification on student flow experience since the flow experience leads to better students' learning outcomes in educational settings (Özhan & Kocadere, 2020;Erhel & Jamet, 2019;Yen & Lin, 2020). Nonetheless, it is still a challenge to measure effectively student flow experience during the learning process in gamified learning systems (Jackson & Marsh, 1996;Hamari & Koivisto, 2014;Lee et al., 2014;Oliveira et al., 2019;Semerci & Goularas, 2020). Instruments previously adopted (e.g. EEG, eye trackers, interviews, and questionnaires) presented limitations to measure flow experiences as high cost and the impossibility of conducting a massive application (Lee et al., 2014;Oliveira et al., 2018;Oliveira, 2019). To overcome these challenges, more recent studies are investigating the effectiveness of measuring flow experiences through the use of student data logs in gamified learning systems, which is showing promising results .

Related works
Different methods were developed and adopted over the years to measure flow experience (Perttula et al., 2017;Oliveira et al., 2018). In the educational context, two methods have been increasingly investigated, EEG and data logs (Wang & Hsu, 2014;Lee et al., 2014;Oliveira et al., 2019. In the study of Wang and Hsu (2014) is explored if the brainwave signal data collected using an EEG headset could help examine flow experience using a psychophysiological approach in the educational context. Another work that investigates the use of EEG to measure flow experience in the educational context was Wu et al. (2021). In this work, the results show that the EEG could be used in detecting students' flow experience in e-learning (Wu et al., 2021). However, the combination of flow and EEG could be a complex process that can hinder the adoption by teachers and professional practitioners (Wu et al., 2021).
The use of student data logs (what represents the behavior of students when using a certain system) to measure student flow experience is another method that is being increasingly adopted in the educational technology context, and it has been presenting promising results (Lee et al., 2014;Oliveira et al., 2019Semerci & Goularas, 2020). One of the first studies to investigate the relationship between students' data logs and their flow experience in learning systems was Lee et al. (2014), which presented an automated detector in a step-based tutoring system, using a step regression approach, to identify student flow experience during the learning process. Oliveira et al. (2019) also investigated students' data logs in online environments to measure flow experience. In their study, Oliveira et al. (2019) proposed a theory-driven conceptual model in order to associate student interaction data logs with the nine flow experience dimensions.
In another work, Oliveira et al. (2020) adopted a qualitative research approach, thinkaloud protocol, to associate users' data logs with their flow experience in an educational system. In turn,  used a structural equation modeling to model students' flow experience through data logs in a gamified learning environment. Moreover, Semerci and Goularas (2020) proposed a new method for evaluating the flow state of students in educational systems. The study used students' grades and also students' interaction using activity heatmaps, deep neural networks in an e-learning environment to calculate their flow state (Semerci & Goularas, 2020).
Despite the evolution of the studies that investigate the use of behavior data logs to measure students' flow experience during the learning process in educational systems, there is a lack of studies investigating the potential of data logs produced by students in gamified learning systems to predict their flow experience. Based on it, this study aims to conduct a study to predict the students' flow experience using behavior data logs produced by students during the learning process in gamified educational systems.

Study design
Our main goal in this study is to investigate what behavior data logs can be used to predict students' flow experience in gamified educational systems. To achieve this goal, we conducted a data-driven study (i.e., a research based on users' data analysis) (Dhar, 2013).

Materials and procedure
To conduct this study, we used a gamified educational system called "bombsQuery" 1 (Pastushenko et al., 2018), which is a tool for teaching the basics of JavaScript/jQuery. It is a gamified system with 11 different missions, each one devoted to a different topic. Each mission has some theory and examples and a free text area where students need to insert their proposed solutions (i.e., write a particular part of jQuery code). The code inserted by the students is automatically checked by the system. The system uses three different gamification elements (i.e., level, progress bar, and immediate feedback).
The playful goal of the missions is to clear the minefield from all bombs. If the student's solution was wrong, they have an unlimited amount of attempts to correct it. However, if their answer was correct, the next level starts. The students can always come back to any of the already solved levels. This might be useful if they want to check the accepted answer for inspiration or go through the theory once again (Pastushenko et al., 2020). The tool was chosen because it allowed the implementation of a module to collect the students' data logs. Moreover, it has already been validated and used by other researchers (Pastushenko et al., 2020). Figure 1 illustrates an example of the system presenting a "welcome message" and tutorial for new users and Fig. 2 illustrates an example of the mission in the system.
To collect the students' behavior, we implemented a module in the tool (described below). Data logs were collected based on the theoretical model proposed by Oliveira et al. (2019). The theoretical model proposed by Oliveira et al. (2019) presents a series of behavior (data logs) theoretically associated with the nine original flow experience  dimensions. The module proposes nine different data logs that can be related to the nine flow experience dimensions. The collected data logs in our study are: • Average students' response time after a feedback (ArtAF): This data represents the student's behavior after receiving feedback, that is, how long it took the student to process certain feedback from the system and enter a response. • Number of mouse clicks (NumCOB): This data represents the student's general behavior when using the system concerning the number of mouse clicks that the student made, from the moment they entered the system until the moment he stopped using the system. • Proportion of wrong steps/responses (ProWS): This data represents the proportion of wrong answers by the student compared to the total number of answers provided, that is, the number of wrong answers/total number of answers provided. • Received feedback (RF): Amount of feedbacks received by the student while using the system. • Total unique session views (TotUSV): This data represents the number of times a student has seen the same phase/level in the gamified system. • Used time to finish a step/mission (UsdTFS): This data represents the time it took a student to finish each activity (regardless of the final result). • Active time in the system (ActTS): This data represents the total time a student spent using the system.
The collected data logs represent two students' behaviors (i.e., speed of students' action and frequency of students' action), as organized next. The frequency of students' actions is represented by the (i) number of mouse clicks; (ii) proportion of wrong steps/ responses; (iii) received feedback; and (iv) total unique session views. Speed of students' action is represented by the (i) average students' response time after a feedback; (ii) used time to finish a step/mission; and (iii) active time in the system. To analyze the students' flow experience while using the system, we used the short flow state scale (short FSS) proposed by Jackson and Eklund (2002). This scale was chosen because it was previously validated by Hamari and Koivisto (2014) to be used in gamified settings, as well as, according to Oliveira et al. (2018) being the most popular scale in studies in the area of educational technologies. As the data collection was done after performing a quick activity (with less than an hour), following the recommendation of the "The Manual for the Flow Scales" (Jackson et al., 2011), we chose to use the short scale composed of nine questions (one for each dimension of the flow experience), instead the complete scale composed by 36 questions. The scale was presented in a fivepoint Likert scale (Likert, 1932). To ensure the quality of the responses, inspired by recent studies (Orji et al., 2018;Hallifax et al., 2019;Santos et al., 2021), we have included an "attention-check" question (i.e., if you are filling out the form carefully, answer 3*) to eliminate responses from students who were not paying due attention when answering the questions. The scale used in this study is presented in the Appendix (see Table 6).
The study was organized in four different general steps: (i) selection of the gamified system, (ii) students' invitation, (iii) system usage, (iv), self-experience report, and v) data analysis. In the first step (selection of the gamified system), the gamified educational system used in the study was selected. Once the system was chosen, in the second step (students' invitation), an invitation was made to the institution's students through official email lists, describing the system and the stages of the study. Students who agreed to participate were then able to read and sign the free informed consent form. Then, in the third step (system usage), students could use the system freely. During this step, the student's behavior data logs were collected when using the system. In the fourth step (self-experience report), after using the system, students immediately answered the FSS so that their flow experience when using the system was collected. In the fifth step (data analysis), the data collected in the previous phase were then processed and analyzed. Figure 3 present the study design.

Participants
The participants were (initially) 31 bachelor students of [Brno University of Technology (Czech Republic)], who volunteered to participate in the experiment. Five responses were excluded because students spent less than 5 min on the assignment, which is an indication that they haven't used the system. Three responses were excluded because the students answered incorrectly the attention-check question (previously described). We, therefore, included 23 participants (mean age = 21.54 years old, SD = 1.33; 6 women, 13 men, 0 non-binary, 4 preferred not to disclose gender). To participate in the study, students received a link to the questionnaires and the assignment, and they could work on it online, at their pace and preferred time.

Results
In our study, we analyzed what behavior (data logs) can be used to predict students' flow experience in a gamified educational system. We focused on using students' data logs to predict their flow experience during the system usage, independent of the students' learning experience when using the system. We used behavior data logs as predictive variables and flow experience (identified through the flow FSS proposed by Jackson and Eklund (2002) and validated by Hamari and Koivisto (2014) for the gamification domain) as a response variable. Initially, before or decisions regarding data analysis, we analyze the data distribution. As our sample consisted of less than 30 subjects, following Wohlin et al. (2012) recommendations, we used the Shapiro-Wilk test (Shapiro and Wilk, 1965).
The results indicate that our data do not follow a normal distribution. After analyzing the distribution of the data, we initially analyzed the direct correlation between each behavior data logs and the nine flow experience dimensions. As our data do not follow a normal distribution, also following Wohlin et al. (2012) recommendations, we used Kendall's correlation, τ test, that is a non-parametric hypothesis test for statistical dependence based on the τ coefficient (Kendall, 1938). Table 1 presents the results. Significant correlations were identified between different data logs and six of the nine flow experience dimensions.
To answer the RQ, we conducted two different analyses. We modeled the students' behavior as latent variables and their flow experience in the gamified educational system as the response variable. Behavior data logs were organized into two different behaviors (i.e., speed of students' action and frequency of students' action), as demonstrated in the section "Materials and procedure". Then, we conducted two different analyses, one analysing the relationships between the students' behavior with the overall flow experience, and other, analysing the relationships between the students behavior with the flow experience dimensions. To analyse our data and explore the relationships, we used Partial Least Squares Path Modeling (PLS-PM) analysis (Hair et al., 2016). PLS-PM was used Table 1 Bivariate correlation coefficients (Kendall's τ ) and significance between each data log and flow experience dimensions Red color indicates a negative correlation, and green color indicates positive correlation. The intensity of the color indicates the level of correlation; ActTS active time in the system, UsdTFS used the time to finish a step/mission, ProWS proportion of wrong steps/responses, RF received feedback, ArtAF average students' response time after a feedback, TotUSV total unique session views, NumCOB number of mouse clicks, CSB challenge-skill balance, MMA action-awareness merging, G clear goals, F unambiguous feedback, C total concentration on the task at hand, CTRL sense of control, LSC loss of self-consciousness, T transformation of time, and A autotelic experience * p < 0.05; * * p < 0.01  Oliveira et al. Smart Learning Environments (2021) 8:30 because it is a reliable method for estimating relationship models with latent variable (Hair et al., 2016), even with small samples (as such our case) (Henseler et al., 2009). To run the analyses, we used the software SmartPLS. 2 To ensure the quality of the used instrument (i.e., if the study data are correctly suited to the instrument), we analyzed the Construct Reliability and Validity (CRV), based on Cronbach's α , Jöreskog's rho, Composite reliability, and Average Variance Extracted. The results of both analyzes show acceptable values or are virtually close to acceptable ones. Table 2 present the results. In the way, we also calculated the discriminant validity for the data, founding acceptable values, since all the square root of the variables' AVE were larger than the correlations that the variable had with the other variables (Fornell & Larcker, 1981). In Table 3, we present the discriminant validity for our data.

CSB
Finally, we analysed the relationship between students' behavior and (overall) flow experience, as well the relationships between students' behavior and flow experience dimensions. Regarding the overall flow experience, the results demonstrate a negative relationship ( β = −0.634|p = 0.038 ) between speed of students' action and their flow experience. Regarding the relationships between   students' behavior and the flow experience dimensions, the results indicated a negative relationship between frequency of students' action and clear goals ( β = −0.414|p = 0.074 ) and also between frequency of students' action and their loss of self-consciousness ( β = −0.672|p = 0.094 ). It was also possible to identify a negative relationship between speed of students' action and challenge-skill balance ( β = −0.815|p = 0.004 ). Tables 4 and 5 present the results of our analysis, and the Fig. 4 present the study path model summarizing the general results.

Discussion
Predicting students' experience in educational systems is a contemporary and complex challenge to be faced. One of the possible alternatives for facing this challenge is through the use of behavior data logs produced by students when using educational systems. In this article, we predicted the students' flow experience based on their behavior data logs, when using a gamified educational system. The results demonstrate that it is possible to predict the students' overall flow experience based on the speed of students' actions in the system, as well is possible to predict clear goals and loss of self-consciousness based on the frequency of students' action and challengeskill balance based on the speed of students' action. Analyzing the correlation between the individual data logs and the nine flow experience dimensions, it was possible to identify some significant correlations (see Table 5 Relationships between data logs and flow experience dimensions β regression coefficient, CI confidence interval, CSB challenge-skill balance, MMA action-awareness merging, G clear goals, F unambiguous feedback, C total concentration on the task at hand, CTRL sense of control, LSC loss of self-consciousness, T transformation of time, and A autotelic experience, Frequency frequency of students' action, Speed speed of students' action *p < 0.1 , **p < 0.05  Table 1). The first correlation identified was a negative correlation between active time in the system and challenge-skill balance ( τ = −.407 ). This result demonstrates that if a student doesn't spend much time using a system, possibly they had not an experience where their abilities level was balanced with the challenge level of the task. This result is in agreement with different studies (Csikszentmihalyi, 2014;Lee et al., 2014;Oliveira et al., 2019), for a student to be motivated to use an educational system for a longer time, the difficulties of the system must be balanced according to their skill level. At the same time, we identified a negative correlation between students' active time in the system and used time to finish a step/mission, with the clear goals dimension ( τ = −.452|τ = −.468 ). All data logs observed in this correlation are related to the time the user uses the system (i.e., speed of students' action) and confirming the theoretical expectations regarding flow and education (Csikszentmihalyi, 2014), indicates that possibly when students do not identify which goals should be fulfilled in the system, they tend to leave the system. At the same time, this result presents a new contribution since other recent studies e.g. (Lee et al., 2014;Semerci & Goularas, 2020; do not analyze this relationship, as well the theoretical model proposed by Oliveira et al. (2019)

Fig. 4 Path model
The correlation results also show a negative correlation between total unique session views and the clear goals dimension ( τ = −.504 ). At the same time, was also identified a positive correlation between total unique session views and students' concentration ( τ = −.504 ). This result can occur because, on the one hand, if a student does not identify the objectives of the system, they tend to leave the system and, consequently, decrease the number of attempts to view a particular page or mission in the system. At the same time, if the student manages to have a high concentration when using the system, they also tend to do more activities and consequently view the same page or section of the system more often. Although we hypothesize the proposed justification, recent related studies e.g. (Lee et al., 2014;Wang & Hsu, 2014; have not analyzed this relationship, which indicates the need for new studies that re-investigate this relationship. The result obtained in our study corroborates the results of other recent studies conducted using other systems or in other domains (Lee et al., 2014;. We advance the literature demonstrating that it is possible step towards to predict the overall students' flow experience (see Table 4 and Fig. 4) based on the speed of students' action in the system ( β = −.634 ). Our results demonstrate that system usage time negatively affects the overall flow experience of students, which means that those students who quickly abandoned the use of the system were not able to achieve a high flow experience. This result, in addition to demonstrating that it is possible to predict the overall students' flow experience based on the speed of students' action, also brings insights related to the need to propose mechanisms to make students who left the system back to using the system.
Predicting each of the flow experience dimensions (see Table 5 and Fig. 4), our results indicate that it is possible to use speed of students' action to predict challenge-skill balance ( β = −.815 ). Similar to the correlation result, the analysis results demonstrate that, as proposed in other studies (Lee et al., 2014;Oliveira, 2019;, if a student does not have the system's activity level balanced with their skill level, they tend to leave the system more quickly. Our results also demonstrate that the frequency of students' actions negatively affects clear goals and loss of self-consciousness. This result may have a direct connection with the fact that if a student cannot clearly understand the goals of a task, they will not be able to achieve a loss of self-consciousness experience (Csikszentmihalyi & Csikszentmihalyi, 1975, 1988Csikszentmihalyi, 1997a). Thus, consequently, we believe that they perform fewer activities or exit the system faster.
Our results yield guidelines related to how to predict students' flow experience based on the behavior data logs produced by these students when using a given educational system. Based on the results presented in this article, it is possible to advance the currently existing literature and take another step towards the automatic identification of the flow experience in educational systems.

Threats to validation and limitations
The study presented in this article has some kinds of limitations, which we seek to mitigate throughout the study. The experience measured in the study (i.e., flow experience) is a complex parameter to be measured. To mitigate this limitation, we use only previously validated methods (i.e., the short FSS validated by Hamari and Koivisto (2014) for the gamification domain and theoretical model proposed by Oliveira et al. (2019) to collect data logs in educational systems). At the same time, to ensure the quality of responses and to avoid external threats (e.g., lack of attention from students), we insert an "attention checking question" within the scale and used other methods (e.g., remove responses from students who used the system for less than five minutes) to avoid data set inconsistencies. Another limitation is related to our small sample size (i.e., 23 students). To mitigate this limitation, we used a robust statistical method capable of accurately analyzing data from small samples (i.e., PLS-PM) (Hair et al., 2016). However, we highlight the importance of replicating the experiment with larger samples to provide a greater results generalization, and we are sure that this paper would serve as an excellent basis for such future research. Finally, R 2 values for some predictions were considered low, which may have been affected by the sample size and limited predictive power. The p values also are affected by the sample size. This limitation also suggests the replication of the study with larger samples.

Research agenda
Based on the results obtained in our study, it is possible to advance the literature through insights related to the prediction of students' flow experience based on their behavior data logs when using a gamified educational system. At the same time, our results allow us to propose a research agenda that can be followed in the coming years. Initially, our study was conducted with a relatively small sample (i.e., 23 undergraduate students), which does not allow us to generalize the results to other contexts and systems. Therefore, we recommend that future research replicate our study with a larger and more heterogeneous sample (i.e., with students from different educational levels).
At the same time, our study was conducted based on a single session using the system. If, on the one hand, this study design allowed us to understand student behavior when using the system, on the other hand, it opened space for the need to understand whether student behavior remains a standard when the system is used more than once by the same group of students. Thus, we recommend that future research replicate our study, however allowing students to use the system more than once over different days, performing multiple data collections over the days of use.
Our study was conducted using a gamified educational system. However, we do not collect data logs related to the system's gamification elements (e.g., points, badges, and ranking position). Although these data are part of a specific context (i.e., gamification), these data can also corroborate new insights for this area of research. Therefore, we recommend that future research explore new data logs in addition to what we explored in this study, such as data logs related to student interaction with gamification elements.
In our study, we initially performed a correlational analysis and then, used advanced statistical techniques (i.e., PLS-PM) to analyze our data. This choice is justified because it is an appropriate technique for this type of analysis, even with small populations. However, the use of other techniques combined with a larger sample can help to deepen the results. Therefore, we suggest that future research can use other types of data analysis such as data mining and machine learning.
Finally, our study presents insights related to how to predict student flow experience based on their behavior data logs, but we do not provide practical instruments that can, for example, be plugged into educational systems to predict students' flow experience. Therefore, we recommend that future studies can propose practical approaches, such as developing algorithms that, based on student data logs, provide an automatic analysis identifying which students are or are not in a flow experience.

Concluding remarks
Predicting student flow experience in educational systems is a contemporary challenge. In this article, we addressed this challenge by predicting students' flow experience in a gamified educational system through their behavior (data logs) during the system usage. Our results show that it is possible to use students' data logs to predict different flow experience dimensions (and the overall flow experience), but the predictive power is still considered low for some cases, which can be better evaluated by replicating the study with a larger sample. As a future study, we aim to replicate the study with larger sample size and propose an algorithm to automatically identify the students' flow experience in educational systems.