Skip to main content

Evaluating computer science students reading comprehension of educational multimedia-enhanced text using scalable eye-tracking methodology


In this research, a mixed-method approach was employed to conduct large-scale eye-tracking measurements, traditionally associated with high costs and extensive time commitments. Utilizing consumer-grade webcams in conjunction with open-source software, data was collected from an expansive cohort of students, thereby demonstrating the scalability and cost-effectiveness of this innovative methodology. The primary objective of this research was to discern the disparities in reading behaviour when students were presented with standard text accompanied by illustrations, compared to the same text with highlighted key terms. The participants, comprised of first-year university students, completed a questionnaire and an introductory test to ascertain their knowledge level. Subsequently, they were segregated into two groups and participated in two reading sessions, during which their ocular movements were recorded. The amassed data underwent both qualitative analyses, facilitated by visualizations, and quantitative analysis, employing statistical measures on the data and test results. Notably, no significant difference was observed in the gaze patterns or test results between the experimental and control groups. However, a significant divergence in gaze patterns was identified between high-achieving students and those experiencing difficulties, as evidenced by the averaged composite heatmaps generated from the data. The findings underscore two pivotal points. Firstly, the feasibility of conducting large-scale eye-tracking experiments is demonstrated. Traditional studies in this field often employ small population samples due to the time and financial constraints associated with methods that utilize specialized eye-tracking hardware. In contrast, our methodology is scalable, relying on low-end hardware and enabling students to record data on their personal devices. Secondly, while eye-tracking may not provide substantial benefits for fine-tuning text already optimized for readability, it could serve as a valuable tool for identifying and assisting learners who are struggling. This mixed-method approach holds significant potential to revolutionize the conduct and interpretation of eye-tracking studies within educational settings.


For decades, educators and researchers have debated whether modern technologies are beneficial or detrimental in improving the quality of learning. In an environment where computers and the internet play a crucial role in facilitating communication between students and teachers, the best way to teach has been a topic of discussion in pedagogical, didactic, and psychological research. However, our focus is not on analyzing educational systems, but rather on evaluating our students' ability to comprehend written material and obtain knowledge provided by higher education institutions. To assess the quality of our learning materials and our students' capabilities, we selected Logic Systems of Computers, a compulsory subject that students partially passed remotely due to mobility restrictions aimed at preventing the further spread of COVID-19. A significant portion of their learning occurred through live broadcasts of lectures and an e-learning course distributed via our university-hosted Learning Management System (LMS), Moodle, which contained chapters composed of both text and multimedia.

In our research, we focus on improving students' reading comprehension and deep understanding of subject matter. Due to the current situation, teaching has been carried out via distance learning using information and communication technology. Therefore, it is crucial to reorganize e-learning courses, specifically the combination and location of text and multimedia elements, to provide information that is easily understood and helps build knowledge in our students. Distance learning often requires that students work independently to a greater extent than they may be accustomed to. Tutors should not limit their teaching methods to providing only text but should also include visual aids such as illustrations, charts, tables, maps, videos, animations, etc. in their lectures. Empirical studies have shown that texts accompanied by visual imagery are more effective than non-illustrated texts (Ariasi & Mason, 2011; Mason et al., 2013). Research has demonstrated that students' metacognitive judgments suggest a preference for learning from texts with diagrams over texts alone, even when visuals may not always be effective (Serra & Dunlosky, 2010; Kuhlmann & Fiorella, 2022; Pijeira-Díaz et al., 2023). Visual aids, such as knowledge maps and pictorial illustrations, influence the types of conceptual relationships students focus on during learning, with maps promoting hierarchical relationships and illustrations emphasizing temporal relationships (van de Pol et al., 2020). Engaging in generative activities like diagramming has been shown to enhance monitoring and regulation accuracy by providing more diagnostic cues for comprehension (Yu Cin, 2021). Additionally, text-diagram reading instruction has been found to improve reading comprehension, with students spending more time integrating textual and pictorial information when diagrams are present (Castillo-Diaz et al., 2022). These findings collectively highlight the significance of visuals, like diagrams, in enhancing students' learning experiences and metacognitive judgments.

Ineffective knowledge transfer in educational e-learning materials for distance learning can stem from various factors identified in the research. These include ontogenical, didactical, and epistemological obstacles that hinder students' understanding and application of concepts (Kuosa et al., 2016). Additionally, challenges such as mismatched expectations between knowledge providers and receivers, cultural differences, and disruptions during events like the COVID-19 pandemic can impede effective knowledge transfer in e-learning environments (Lim et al., 2022; Sukri et al., 2023). Lack of a well-thought-out e-learning strategy and the absence of a centralized repository for educational materials can also contribute to ineffective knowledge transfer, leading to redundant content creation and resource wastage (Wojcik et al., 2020). Therefore, addressing these obstacles and implementing strategies to enhance interactivity, accessibility, and completeness of learning materials is crucial for improving knowledge transfer efficacy in distance learning settings.

One of the main issues we've noticed in our practice is that although students are able to read and understand words and sentences individually, they often do not reach the level of proper reading comprehension. Inspired by (Adler & Van Doren, 2014) description of the four levels of reading, we have attempted to develop a method of identifying which level of reading our students are capable of. From our previous experiences, we suspected that students pursuing undergraduate degrees are capable of the inspectional level of reading but are often unable to approach the text with analytical or even syntopical thinking.

In educational settings, a common issue observed is that students can decode words and sentences but struggle with overall reading comprehension (Nitzkin et al., 2014; Scott & Balthazar, 2013; Zargar et al., 2020). This discrepancy highlights the importance of effective comprehension monitoring, where readers evaluate and regulate their understanding (Zipoli, 2017). Middle school educators face challenges in enhancing reading skills, emphasizing the critical role of vocabulary instruction in improving comprehension (Fu, 2015). Understanding sentence structures is crucial for reading comprehension, especially for students with language impairments or learning disabilities. To address these issues, educators should focus on developing students' abilities to accumulate visual and non-visual information while cultivating good reading habits. By integrating strategies to enhance comprehension monitoring, vocabulary instruction, and knowledge of sentence structures, educators can better support students in achieving comprehensive reading skills.

Inoue and Paracha (2016) were also concerned about the deficiencies in reading comprehension skills and conducted experiments very similar to ours, although with smaller samples of students. In their article, they described three experiments designed to provide insights into how different students work with text and visual materials. In the first experiment, students were divided into three groups, with one group reading the text alone, the second reading the text with an image, and the third studying illustrations. In the second experiment, participants read a longer text without illustrations, and researchers compared fixation durations between better and poorer readers. In the last experiment, students read a comic strip. Here, differences in fixations between better and poorer readers could be observed, with better readers directing their attention not only to speech bubbles but also to characters' facial expressions. Based on these experiments, the authors made several observations.

Reading is not a smooth process of eye movement from left to right across the text or image. Better readers often scan the entire content, identify keywords that capture their attention, and devote more time to them. Weaker readers do not understand the text at first glance and must read word by word, often exerting considerable effort to comprehend multisyllabic words. They focus on individual words and overlook broader contexts or objects outside the main text.

Eye-tracking (ET) has been explored as a less invasive and potentially more accurate method for assessing cognitive processes during reading compared to traditional methods like verbalizing thoughts (Pattemore & Gilabert, 2023; Sidhawara et al., 2023). Studies have shown that eye-tracking can provide valuable insights into how individuals engage with multimedia learning materials, revealing gender-based disparities in information processing preferences (Mézière et al., 2023). Additionally, research has demonstrated the effectiveness of eye-tracking in analyzing attention patterns of young individuals towards environmental advertising and journalistic texts, highlighting the cognitive effects of different media formats (Lobodenko et al., 2023; Saini et al., 2022). Overall, the use of eye-tracking technology offers a promising alternative for measuring cognitive processes during reading comprehension and multimedia learning, providing detailed and objective data for understanding individual differences in information processing strategies. Jamet (2014) used eye-tracking as a tool to study another principle of multimedia learning described by Mayer (2014). They recorded and compared eye movements of users in a digital learning environment with the addition of visual cues that indicated to students which part of the material they should focus on at any given moment. Such guidance had a positive effect on the retention of information, whether verbal or visual in nature. However, it did not result in improvements in test tasks aimed at the transfer (application) of knowledge.

Cheng et al. (2020) examined the understanding of reading text through an interesting combination of data visualization obtained from eye-tracking and quantitative electroencephalogram (EEG) analysis. They selected reading speed in a specific Area of Interest (AOI), reading time in AOI, and the frequency of transitions between AOIs as eye-tracking metrics for their study. During the experiment, students were shown visualizations of how their teacher approached the study text. By subconsciously imitating their teacher's thought processes, students achieved a better understanding of the text.

While most of the literature we reviewed focused on using ET to investigate the learning process, (Mason et al., 2017) describe using ET as a tool for modeling the learner's perspective based on the expert reader model Expert Model of Eye Movements (EMME).

Data mining and eye-tracking are fundamental tools in our research. In the pre-pandemic period, when we could rely on in-person learning and establish close contact with our students, we had success using a head-mounted eye-tracking device. However, due to the need for close physical contact between researchers and large groups of students, the use of such devices was impractical during our research. After considering many approaches, we decided to use existing web technologies and the eye-tracking library Webgazer to collect eye-tracking data using off-the-shelf hardware. Based on this library, we developed a web application that allows us to conduct research with a greater number of students without the need for close physical proximity or in-person meetings.

Equally important to identifying problem areas in e-learning courses and adjusting their content and format is understanding how students read and comprehend professional texts provided in e-courses using the LMS Moodle environment. One of our research questions aims to determine whether reading the specified professional text increases students' level of knowledge.

Research gap and questions

In the literature we’ve perused there was one common trait – relatively minute number of participants. Some of the key reasons for this are relatively high price of the state of the art eye-tracking hardware and the long time required to perform the measurements and evaluate the data. The aim of this publication is to describe our methods for identifying problematic content areas in current e-learning courses and modifying them to enhance knowledge transfer to students. In order to do so, we were in need of gathering a large amount of data in a relatively short time, in a manner that wouldn’t obstruct the teaching process. For this reason, we have developed a toolset enabling us to perform these measurements using off-the-shelf hardware and open source library Webgazer.js. Following this approach, our research aims to answer the following questions:


Which parts of our educational e-learning materials result in ineffective knowledge transfer to students participating in distance learning?


Does highlighting important keywords enhance reading comprehension, and if so, are there measurable changes in reading behaviour?


To what extent can we experimentally verify whether the modifications we made to the existing content of the Logic Systems of Computers (LSC) subject e-courses caused the desired increase in the quality level of study materials based on evaluation of student results? Eye Tracking technology is a fundamental tool used in our research.

Literature review

The results of prior research indicate that students who consume educational text in combination with illustrations perform better than students who only have plain text. The integration of text and illustrations in educational materials has been shown to enhance learning outcomes, as it can enrich vocabulary, improve comprehension, and motivate students, thereby positively influencing their achievement (Tim & Stefan, 2023). This is supported by findings that illustrations can serve as an effective means of communication, helping children understand what they have read (Kuznetsov, 2023). However, the effectiveness of combining text and visual elements can be influenced by various factors, including the learner's specific characteristics such as intelligence, memory, and visual attention (Arbresha & Rabije, 2022), as well as the emotional content of the pictures, which can serve as a boundary condition for the multimedia principle (Qian & Wei, 2022). Despite these benefits, the potential for a split-attention effect exists, where the learner's attention is divided between text and illustration, potentially impairing learning due to increased cognitive load (Yu Cin, 2021). This effect suggests that spatial separation between text and illustrations can hinder the integration of information, although the impact on learning outcomes might not be directly proportional to the spatial distance between these elements (Lalić-Vučetić & Ševa, 2021). To mitigate this, instructional strategies that emphasize the decoding of diagrams and the integration of relevant textual and pictorial information have been developed, showing immediate benefits in reading comprehension and learning processes (Nurjanah Mohd & Siew Ming, 2020). Moreover, the role of teachers in supporting students with text-picture integration is crucial, as their instructional strategies can significantly affect students' skill improvement in this area (Faustin & Samuel Nyock, 2019). This highlights the importance of not only the design of educational materials but also the pedagogical approaches employed to facilitate effective learning. In conclusion, while integrating text and visual elements into educational materials can improve learning outcomes, addressing the split-attention effect through carefully designed instructional strategies and effective teacher support is essential to maximize the benefits of multimedia learning materials (Sweller & Chandler, 1992; Britta et al., 2019; Wim et al., 2019).

The cognitive theory of multimedia learning (Magdin et al., 2021; Mayer, 2005) builds on the theory of dual coding and emphasizes the involvement of multiple senses for a more effective learning process. Pupils can better recall knowledge from memory, where they can imagine its image. Conjuring up a mental image is a key factor that largely determines whether information is remembered or not. Mayer published several papers concerning various aspects of using multimedia in education and establishes multiple widely recognized theories. The basic premise of this work is that students achieve deeper understanding of the subject matter when receiving information from a combination of words and images, than they would from a verbal discourse alone. This premise is often called the “multimedia learning theory” (Mayer, 2014).

Eye-tracking technology plays a significant role in enhancing motivation and learning in smart learning environments. By monitoring students' eye movements during e-learning activities, such as reading texts or engaging with multimedia content, eye-tracking can provide valuable insights into attention levels, cognitive differences, and individual learning behaviours. Overall, the integration of eye-tracking technology in educational settings offers valuable insights that can positively impact student motivation and learning outcomes (Sharma et al., 2020).

The eye-mind theory posits a direct link between where we look (gaze) and what we are thinking or processing cognitively. This theory is supported by various studies across different cognitive tasks and populations, indicating that eye movements can indeed reflect underlying cognitive processes. For instance, eye-tracking technology has been used to explore the relationship between gaze direction and cognitive activities, revealing that complex questions elicit more eye movements, suggesting a deeper level of cognitive processing (Carmo & Alice, 2023). Similarly, in reading, computational models suggest a tight coupling between lexical processing and eye movements, indicating that the eyes move in response to cognitive processing demands (Vanessa et al., 2023). Moreover, gaze behaviour changes during mind wandering, showing a strategic shift in visual processing when attention is diverted internally, further supporting the eye-mind theory by demonstrating how cognitive states influence gaze patterns (Myrthe et al., 2020). In social cognition, modifications in gaze strategies have been observed in individuals with cognitive impairments, such as those with frontotemporal dementia, Alzheimer's, and Parkinson's diseases, indicating a link between cognitive processing of social cues and gaze direction (Polet et al., 2022). Eye movements also reflect cognitive abilities beyond specific tasks, as shown in studies relating gaze metrics to fluid reasoning, planning, and working memory (Paulo Guirro et al., 2023). Additionally, the presence of an audience can modulate gaze and prosocial behaviour, suggesting that cognitive processing of social signals influences gaze behaviour (Paulo Guirro et al., 2023). Research on gaze following and theory of mind (ToM) further illustrates how attributions of mind influence gaze perception and judgments about gaze direction (Paulo Guirro et al., 2023). However, challenges in interpreting eye movement data, such as the dissociation between fixation location and cognitive processing locus, highlight the complexity of the relationship between gaze and cognition (Paulo Guirro et al., 2023). Despite these challenges, the study of eye movements in interface design has shown that gaze patterns can infer cognitive processing, supporting the eye-mind theory's premise (David, 2004). In summary, the eye-mind theory is substantiated by evidence from various fields, demonstrating a significant, though complex, link between gaze behaviour and cognitive processes across different contexts and tasks (Quoc Hao et al., 2010).

In the study by Ben Khedher et al. (2018), static and dynamic eye movement metrics were utilized to assess students' performance in a medical serious game. The researchers analyzed how students visually explored the learning environment and examined the impact of these metrics on students' reasoning performance. Static eye movement metrics focused on fixed points of gaze and duration of fixations, while dynamic metrics considered the movement patterns of the eyes, such as saccades and smooth pursuits. The results indicated significant associations between these eye movement metrics and students' outcomes, with dynamic metrics particularly reflecting students' analytical reasoning abilities. This study underscores the importance of eye tracking as a valuable tool for understanding students' learning experiences and performance in educational settings (Ben Khedher et al., 2018).

Lohmeyer et al. (2013) propose the use of eye-tracking as a tool to examine differences in different strategies for studying technical illustration by beginning and experienced engineering designers, especially to isolate approaches commonly referred to as intuition and experience. Schindler and Lilienthal (2017) were using eye-tracking to observe students' creativity in solving mathematical problems. They described three ways to visualize and process gaze data: Gaze overlaid videos, heatmaps and scanpath; and statistical measures. In view of the previous research, we conclude that the eye-tracking method can be used as a tool to verify the quality of educational texts in terms of their readability and usability as a source of information needed to improve them. Furthermore, we aim to show that it can also reveal different reading behaviours, potentially helping us to identify students who only superficially skim the text without properly processing it, thus allowing us to reach out to them and offer them assistance. Underwood et al. (2004) found that in both cases where a sentence is displayed simultaneously with images or when sentences precede images, fixation duration on images was longer than on sentences. On the other hand, when a sentence can be read first, followed by inspecting the image, there are fewer fixations on the image compared to the scene. In this case, processing is easier when the text is read first.

These findings suggest that the order in which text and images are presented can influence the way learners process information. By understanding these characteristics of eye movements, educators and instructional designers can create more effective multimedia learning materials that cater to the natural processing tendencies of learners. This can ultimately lead to better comprehension and retention of the material being taught.



In order to conduct our measurements, we reached out to students in the Applied Informatics study program. A total of 80 full-time students were directly involved in the experiment. The students were subsequently randomly divided into two equally sized groups—the experimental group, which was enrolled in a variant of the e-course containing chapters with visually distinguished important terms, and the control group, with the same subchapters but without the highlighted terms. A total of 68 students took part in both measurements; however, only 59 achieved serviceable results from the first measurement and 61 from the second one. We performed a conjunction of these groups and found that only 42 students successfully completed all measurements and post-tests, 18 from the control group and 24 from the experimental group. At the end of the semester, 29 subjects successfully completed the course (17 from the experimental group, 12 from the control group), while 13 did not meet the necessary conditions for successful completion (7 from the experimental group, 6 from the control group).


The study was structured to occur throughout the semester. Initially, students completed a questionnaire to capture their motivations and prior experiences. This was followed by an introductory test to assess their baseline knowledge levels. During the first several weeks, we conducted two experimental measurements, each accompanied by post-tests to monitor progress. The semester concluded with a final test to evaluate the overall effectiveness of the educational interventions.


Questionnaires and Tests: These tools were used at various points (beginning, during, and end of semester) to gauge the students' knowledge and engagement. The questionaries contained questions regarding their prior knowledge of the subject and basic demographic information – age, sex, previous education, etc. For the tests, we have reused our standard tests, that are used to measure students’ knowledge as part of the curriculum of the subject. All of them were administered within the course in LMS Moodle.

Educational Materials: The materials included three subchapters on electric current and basic properties of semiconductor components, with a focus on bipolar transistors. The first subchapter provided an introduction with minimal new information, the second featured a brief text and a simple illustration of electric current in a conductor, and the third presented an illustration of a bipolar transistor with accompanying explanatory text.

Gaze Tracking Technology: Eye movements were recorded using a web-based application developed with the Webgazer.js library. This technology allowed us to capture data on saccades, fixations, and regressions, which are indicative of cognitive processes such as confusion or difficulty in understanding.


The experiment was conducted under hybrid classroom conditions due to health and safety restrictions related to ongoing uncertainties. This hybrid approach allowed us to adapt to changing class schedules and attendance modes seamlessly.

Data analysis tools

We developed a secondary application to analyze the gaze data. This application enabled us to create numerical representations and visual outputs, such as heatmaps and scanpaths, to further understand the engagement and learning patterns of the students. These visualizations were adjustable in terms of fixation window size, transparency, and heatmap sensitivity, allowing for detailed analysis of user interaction with the educational content.

The experiment we conducted on computer science students was significant due to the acquisition of results on the cognitive abilities of the examined students under non-standard conditions. These conditions arose from the situation that occurred during the studied period (Covid-19). We also based our research questions on this situation. We considered the mode and form of education, which was implemented remotely online, greatly affecting communication with students and the implementation of the eye-tracking method. Based on this, our first research question focused on identifying which parts of the specialized texts of current e-courses present the greatest challenge to students beginning their studies in computer science subjects in distance learning formats.

Our experiment concentrated on comparing reader behavior in related specialized texts with content of an informatics nature. The tested students were required to demonstrate knowledge in the specified area. This condition was established based on an entrance questionnaire completed by the students participating in the experiment. The outcome was qualitative data on the students' knowledge in the assigned fields. This data was later used to answer the subsequent research question, in which the insights from the basics of informatics were balanced. The requirement for a balanced level of knowledge stemmed from the questionnaire results, in which the outcome of the responses depended on the participants' previous study.

During our gaze tracking measurements, we observed the eye movements of subjects as they read the educational text and studied the visual aids. The human eye typically moves in distinguishable patterns, including saccades and fixations. Saccades are rapid movements from one object of interest (e.g. syllables in a word), usually lasting less than 100 ms, while fixations are longer periods of time during which the eye is almost motionless, typically lasting from 100 to 600 ms. While reading, the eyes move in a continuous alternation of saccades and fixations, rather than smooth tracking. Previous studies have consistently linked these movements to cognitive processes, identifying various states of mind, such as confusion or difficulty in understanding the text being read. While the eye is fixated on a single point, the reader can perceive objects in a small window around it (Rayner & Duffy, 1986; Rayner & Reingold, 2015; Rayner et al., 2010). Another important type of gaze behavior is regressions, which refer to backwards jumps to previous words or the beginning of sentences, often indicating confusion or misunderstanding. An increase in the frequency of regressions may imply a higher complexity of the text or reflect the cognitive state of the reader (Walker, 2021).

Due to the ongoing restrictions, health and safety measures and the related uncertainty in terms of class scheduling and organization, we have been motivated to develop a method to perform eye-tracking measurements without the need for physical contact or expensive equipment. Our process consists of two related applications. Gaze data can indeed be captured remotely using web-based technologies like WebGazer.js, enabling researchers to record gaze coordinates through web cameras, including those integrated into portable devices (Boels, 2023). WebGazer.js is a webcam-based eye-tracking library that shows promise for remote eye-tracking research (Papoutsaki et al., 2016; Steffan et al., 2024). It has been utilized in studies involving young children to capture goal-based action predictions (Wong et al., 2023). The library aims to address the limitations of traditional eye-tracking devices by offering a cost-effective and scalable solution for assessing gaze behaviors in educational settings, particularly for neurodivergent students (Asghari et al., 2023). However, webcam-based eye tracking, in general, faces challenges related to spatial accuracy, calibration validity, and gaze prediction methods (Faura-Pujol et al., 2023). Efforts have been made to improve data quality by estimating spatial offset and enhancing the correlation with Areas of Interests (AOIs) defined over stimuli (Abhaya et al., 2022). Overall, WebGazer.js and similar technologies hold promise for facilitating eye-tracking research, especially in scenarios where traditional eye-tracking devices may not be feasible or cost-effective. To facilitate the evaluation of measurements, we have developed a secondary application to help us process the data, creating numerical representations of the reading process and drawing visualizations, such as heatmaps and scanpaths from the data files.

Figure 1 shows the graphical interface of this application, specifically the initial screen with visualization settings. Here the user can select the data file and the associated screenshot. A second option is to load any number of data files (.csv) and generate a composite heatmap from them. The application compensates for differences in screen resolutions and scales all heatmaps to a uniform resolution. We can also enter the time interval we want to visualize or leave the default setting to generate visualizations from the total reading time. The user can also enter the approximate location and dimensions of the illustration Area of Interest to count the number and durations of fixations on the text, illustrations and the transitions between these two AOIs.

Fig. 1
figure 1

Webgazer Visualiser settings

In the heatmap settings, the user can adjust the size of the radius for each gaze point, as well as the minimum and maximum transparency of the heatmap. Additionally, the user can specify the blur level, maximum number of data points to be plotted in a single area, and data point weight for a single fixation. The last two options have the greatest impact on the sensitivity of the heatmap. The user can also input the relative size of the fixation window. Based on our observed recordings, we determined the fixation zone size for our experiment to be 100 × 50 pixels—approximately the size of an average word on our students' screens.

The primary output of our application are the numbers gained by analyzing the gaze data file shown in Fig. 2 and a screenshot with transparent layers of heatmap and scanpath visualizations (Fig. 3).

Fig. 2
figure 2

Numerical output of the application

Fig. 3
figure 3

Heatmap and Scanpath generated by the application

Once the visualizations have been generated user can export them as.png images to maintain the transparency useful for further evaluation. The Webgazer Visualiser source code is available on GitHub at (Turcani et al., 2024).


According to our original hypothesis, we expected statistically significant differences between students enrolled in the control group (reading text without highlighted keywords) and students in the experimental group (reading text with visually distinguished keywords). Despite our expectations, our hypothesis was not confirmed by the test results; however, during the gaze data analysis and visualizations, other interesting patterns emerged. Table 1 shows the knowledge background (based on the questionnaire) and students' results in the entrance test, experimental post-tests (P1, P2), and the final test at the end of the semester. We observed that the knowledge background, the entrance test, and the first post-test have a normal or borderline normal distribution. However, the results of the experimental group in the second post-test no longer have a normal distribution and are slightly inclined towards a higher score. The final mark at the end of the semester is also not normally distributed and, in both groups, it is inclined towards a better final mark.

Table 1 Descriptive statistics—Students’ knowledge, C Control group, E Experimental group

In summary, the experimental group generally performed better than the control group in the post-tests and the final test. However, there is also a higher variation in scores among the students in the experimental group. The distribution of scores is not consistent across tests, with some tests showing deviations from normality.

In conclusion, the Tables 2 and 3 demonstrate that there is a positive correlation between test scores and various eye-tracking measures, and most of these correlations are statistically significant. This suggests that there may be a relationship between these eye-tracking measures and students' performance on the tests. However, it is important to note that correlation does not necessarily imply causation, and further research may be necessary to investigate these relationships more deeply.

Table 2 Spearman’s correlation between P1 test score and eye tracking data from first two readings (T1 and T2)
Table 3 Spearman’s correlation between P2 test score and eye tracking data from third reading

While studying the reading process, visual analysis of heatmaps and scanpaths can be a useful but time-consuming tool. Figure 4 presents a heatmap generated from the gaze data of the students after reading the third subchapter of the experimental text.

Fig. 4
figure 4

Screenshot with a heatmap overlay

In terms of static images, we have found composite heatmaps created by combining many measurements over a specific subchapter to be very telling. Figure 5 contains a composite heatmap of the control group reading the third subchapter.

Fig. 5
figure 5

Composite heatmap—control group

It is evident from the composite heatmap that most students in the control group focused their fixations on the image rather than the text, indicating a possible split-attention effect. On the other hand, the averaged composite heatmap of the experimental group in Fig. 6 reveals a slight decrease in attention on the image compared to the control group. However, there was no significant shift of attention towards the text, as expected.

Fig. 6
figure 6

Composite heatmap—experimental group

We discovered an intriguing pattern when we analyzed the results without the previous group distinction. After dividing the students into three groups based on their final test scores, we made an unexpected observation. The control and experimental groups had equal representation of failing and average students (6:6 and 9:9, respectively). However, significantly more top-performing students were in the experimental group, with nine students placing in the top 12 of the class, compared to only three in the control group. To investigate reading behavior patterns, we sorted the students by overall point score and divided them into three tertiles. Figure 7 displays the composite heatmap of the first tertile, comprising students who obtained the highest scores during the semester.

Fig. 7
figure 7

Composite heatmap—top tertile

Conversely, Fig. 8 displays the composite heatmap of the last tertile of students, all of whom finished the course with a failing grade. It is evident that these two heatmaps are noticeably different, with the top tertile students exhibiting more compact and focused attention on both the text and images. In contrast, the failing students' group had their gaze wandering across the screen, with less intense fixations on the illustration in the center of the page.

Fig. 8
figure 8

Composite heatmap—bottom tertile

As mentioned in the previous section, students' knowledge was assessed throughout the semester, including an introductory test at the beginning of the year (which was not counted towards the final grade). Analyzing these results, we can observe slight differences between the three tertiles.

After the first experimental reading the differences between the tertiles were slightly smaller, with the median result of the middle tertile being almost the same as that of the top tertile. The only students to get a perfect score in the P1 post-test were in the top tertile. In summary, the introductory test results are divided into three tertiles, with increasing median and mean scores from the bottom to the top tertile. There is more variation in the scores among students in the mid and bottom tertiles compared to the top tertile. The distribution of scores is approximately normal for each tertile, with some differences in skewness and kurtosis across the tertiles (Table 4, Fig. 9).

Table 4 Descriptive statistics—Introductory test results by tertiles
Fig. 9
figure 9

Introductory test results by tertiles

Interestingly, in both P1 and P2 post-test, we observe only a slight difference between students from the top and middle tertile, while students from the bottom tertile visibly lag behind in their results immediately after reading a text. This reinforces our presumption, that reading comprehension is a strong predictor for overall results (Tables 5, 6; Figs. 10, 11).

Table 5 Descriptive statistics – P1 results by tertiles
Table 6 Descriptive statistics – P2 results by tertiles
Fig. 10
figure 10

P1 results by tertiles

Fig. 11
figure 11

P2 results by tertiles


Utilizing Eye-Tracking technology is not a novel method for investigating how readers interact with user interfaces or process the text they read. The authors have developed a procedure that not only offers a fresh perspective on its application but also describes a simple and rapid method for collecting valuable measurements, as well as proposing a qualitative evaluation process for the gathered data. In the following discussion, we will attempt to confront our findings obtained through research in the field of reading comprehension using the Eye-Tracking method with the approaches and results of researchers dealing with similar issues.

By conducting a qualitative analysis, we verified the change in reader behavior when students confronted the specialized text significantly more than the accompanying illustration. After multiple readings, the speed of eye movement stabilized in a text-to-image ratio. In the professional community, terminology has been established that divides those with different perspectives on reading integrated text into visualizers and verbalizers. This issue was investigated by examining the differences between visualizers and verbalizers (Koć-Januchta et al., 2017) in two categories of students with distinct study strategies. The students were divided into categories based on a questionnaire and subsequently used eye-tracking for a deeper analysis of their visual behavior. The results of their experiments showed that visualizers spent considerably more time attentively examining images, while verbalizers read the accompanying text more thoroughly.

In our experiment, the results of the aforementioned authors were confirmed. These differences were clearly visible in the heatmap visualization, which is the outcome of the established research question.

Significant findings in this area were also achieved by Porta et al. (2012), who focused on the use of eye-tracking, specifically recording changes in pupil size, in combination with other physiological sensors to measure cognitive load and emotional state during studying. However, as they noted, measuring pupil size under natural conditions is unreliable and easily influenced by external factors (room lighting properties, screen brightness settings, etc.).

Research by Erdogan et al. has shown that eye tracking can provide valuable insights into users' cognitive processes and engagement levels. By analyzed eye movements, educators can better understand how students interact with the platform, leading to improvements in design and instructional strategies (Erdogan et al. 2023). Zhao et al. (2014) investigated differences in reading strategies using eye-tracking. They focused on the quantitative analysis of fixations, the number of visits to individual AOIs, time to first fixation, and transitions between images and text. They did not observe a statistically significant change in the number of transitions between different reading strategies. The integration of photography with scientific texts, as explored by Barcelos et al. suggests that visual aids can significantly enhance the comprehension and engagement of readers, a principle that can be extended to the interpretation of heatmap data (Ghosh et al., 2021). Our study also builds upon the understanding that reading behavior and decision-making can be predicted by eye movement patterns, emphasizing the role of expertise levels and acquired information in shaping these patterns (Usée et al., 2020).

For our research and obtaining results, the findings of Lin et al. (2017) were essential. They compared the influence of detailed and simplified illustrations on students' retention of information, using both quantitative and qualitative analysis of the obtained results. In the qualitative part, they focused on metrics that we could also measure and analyze with our cost-effective solution. Some metrics, such as the probability of first fixation and search time, were not included in our experiment due to the details of the experimental design, which reduced the relevance of these metrics in our case. However, their qualitative analysis served as an example, using line and circle visualizations to represent saccades and fixations, respectively. In our experiment, we omitted circle sizes representing fixation durations and instead focused solely on color heatmaps, which better represent the intensity and duration of fixation on a specific area.

Given the multimedia nature of our texts, the confirmation of the multimedia effect by Lindner et al. (2017) was crucial for our research. They primarily focused on the quantitative analysis of fixation times on individual AOIs, while only using the scanpath visualization method as an illustration of these measurements. From our perspective, this visualization method appears to be a suitable tool for qualitative analysis as well, although it is time-consuming with a large sample size. The statistical analysis of the measured data, despite problems and shortcomings caused by the current situation, provided the authors with enough information for the conclusions of their research.

In the second research question, we aimed to investigate the influence of highlighting keywords in the text on changes in reading comprehension speed and other measurable changes in reader behavior. In fact, we conducted two different reading sessions with three different texts. Students were divided into an experimental and control group, with the experimental group reading the highlighted text and the control group reading the plain text.

The reading sessions were divided as follows: In the first session, both groups read essentially the same text, followed by a post-test. The second session consisted of two texts, and after reading them, the participants took a follow-up post-test. This design allowed us to compare the differences in reading comprehension and other measurable changes in reader behavior between the experimental and control groups.

Visualization techniques were essential for understanding the collected data. From the total number of measurements, we generated several composite maps that demonstrate the differences between the experimental and control groups across the different reading sessions and texts. These composite heatmaps clearly illustrate the phenomena that are observable from statistics but are much more explicit and evident when conveyed through visualizations. By comparing the results of the experimental and control groups, we could assess the impact of highlighting keywords in the text on the reading process and comprehension.

The description of the method by Ozcelik et al. (2009) was significant for confirming the results of our research for the given research question. They examined the positive impact on learning based on color highlighting in multimedia materials. They compared students of the experimental and control groups, with the experimental group learning using materials where some essential parts of black-and-white images were color-highlighted. The control group only had access to black-and-white materials. The results of their experiments show that color differentiation of essential parts of the studied material has a positive effect on student outcomes. In our case, we evaluated the choice of colors or highlighted multimedia text at the same level. Our research results are comparable.

To assess the level of readers and their abilities in reading comprehension, we used the outputs of Abundis-Guitiérrez et al. (2018), who conducted psychological research on groups of less and more skilled readers. However, their experiment results did not find a causal relationship between the number of regressions and reading comprehension. Again, they focused only on the quantitative analysis of the measured data, without mentioning the use of qualitative analysis through any visualization methods. The results of our qualitative analysis, however, suggest a possible link between the number of regressions and the number of transitions between AOIs with the level of students' reading skills.

The culmination of our research was to establish a third research question related to the experimental verification of the adjustments we made in the existing content of LSC subject lessons, assuming that these adjustments would lead to the desired increase in the quality level of study materials in the given e-courses by evaluating student outcomes.

After verifying our methodology in a pilot experiment, we decided to apply the knowledge gained in the design and execution of a new, more extensive experiment. This time, in addition to eye movement measurements using ET, we also focused on monitoring the study results of our subjects.

For the correctness of the established third research question, the publication outputs of Ponce et al. (2018) are relevant, who carried out eye-tracking experiments to verify Mayer's thesis, which states that combining multiple study strategies can achieve better results in information retention, which will be reflected in better results in knowledge tests and exams. Ponce and Mayer assumed, and our research in this area confirmed, that the number of transitions from one AOI to another signals an integrative process, with fixation duration being interpreted as a metric of the cognitive process of information processing necessary for organizing and integrating knowledge.

An interesting observation in the study of literature was that most of the referenced studies were conducted with a relatively small experimental sample and used relatively expensive and demanding technical equipment—dedicated eye-tracking devices, either attached to the screen (e.g., Tobii Pro X2) or worn as glasses. Such an approach certainly provides higher measurement quality but also carries time and financial constraints. The authors proposed a procedure that is practically infinitely scalable and can be used "en masse" for the inclusion of entire cohorts of students in both face-to-face and distance education. We attempted to verify this thesis in the winter semester of the academic year 2021/22 when we included the entire cohort of first-semester students of the Logic Systems of Computers subject in the subsequent experiment. With such a large sample, we see greater potential for statistical analysis of the measured data, especially in combination with ongoing student assessment using autotests in our LMS Moodle education system.

Increasing visual attention to relevant images generally led to higher performance in the study. Therefore, it is extremely important to focus students' attention on images, especially in multimedia education. As a design solution, various text highlights or other forms of indicating the importance of words in the text can significantly direct students' attention to related images that are part of the studied texts. In this way, students can make more pronounced corresponding transitions between text and images.

In addition to measuring eye movement in the spatial dimension, the least frequently used measurements in eye-tracking research of multimedia learning were fixation position, fixation sequence, and scanpath patterns. In this dimension, these measurements can display in detail the spatial sequences of visual attention over time. However, there is a limited number of studies that investigate the processing of multimedia information with such measurements obtained from scanpaths (Krejtz et al., 2016).

Regarding our research questions, we were unable to demonstrate the influence of visual differentiation of essential terms in lessons. Differences between students assigned to the control and experimental groups were neither consistent nor statistically significant, and these relationships were not discovered even by a detailed examination of individual or composite heatmaps.

On the contrary, we managed to discover a considerable difference in the oculomotor behavior of students with the best study results and their less capable peers. This difference is visible in both the numerical data and their visual representation. Based on this result, we assume that it is possible to create a method for detecting exceptional students or, conversely, classifying students who have difficulty understanding the material we present.


This paper systematically evaluates research in the field of multimedia learning and explores the use of widely available equipment for eye-tracking measurements. We approached students enrolled in the applied informatics study program to participate in a series of experiments. Our findings support the relationships between cognitive processes, gaze patterns, and behaviors. The growing need for high-quality multimedia learning materials emphasizes the development of new methods for evaluating text readability and tools to assist students with lower reading proficiency. Our research not only describes our approach to optimizing text for readability but also introduces a novel method for identifying struggling students who may fail to reach their potential without appropriate intervention.

While reviewing the literature, we found increasing support for using eye-tracking technology in educational research, allowing for detailed examination of learning processes and generating data for various psychological research in the context of higher education. Due to the low statistical significance of our quantitative results, we shifted our focus to qualitative analysis using heatmap and scanpath visualizations. These revealed changes in students' behavior after highlighting keywords in the text. Students initially fixated on the image while reading the unmodified, unhighlighted text but quickly switched to reading the text with minimal transitions between text and images.

No significant difference in learning outcomes was observed between the experimental and control groups. However, we found significant differences in gaze patterns and behaviour between students who achieved the best results at the end of the semester and those who finished with a failing grade. We assume that the better results of the top group of students may be due to their more effective reading, which is visible in their generated heatmaps. The eye-tracking method may have limited use for optimizing well-written texts, but it could prove useful in identifying students who need additional help to keep up with their peers. Early discovery of students in the bottom tertile allows for a positive impact on their learning process, helping them thrive in their studies with extra motivation and support.


This study has several limitations that should be acknowledged:

  • Accuracy and Precision of Consumer-Grade Webcams:

    • Technical Limitations: The use of consumer-grade webcams, while cost-effective and scalable, may not provide the same level of accuracy and precision as specialized eye-tracking equipment. This can affect the reliability of gaze data, particularly for fine-grained analyses.

    • Calibration Issues: Calibration of consumer-grade webcams can be less reliable, leading to potential inconsistencies in data collection. Variations in lighting, camera quality, and user positioning can introduce noise into the data.

  • Sample size and diversity:

    • Limited sample size: Although the study involved a relatively large cohort of students, the final sample size of participants who completed all measurements and post-tests was reduced. This can limit the generalizability of the findings.

  • Homogeneity of participants: The study focused on first-year university students in a specific program (Applied Informatics). The results may not be generalizable to students in other disciplines or education levels.

  • Experimental conditions:

    • Remote Data Collection Challenges: Conducting the study remotely introduced variables that are harder to control, such as the students' environments, which could affect their reading behavior and the accuracy of eye-tracking data.

    • Hybrid Learning Context: The unique conditions imposed by the COVID-19 pandemic, including the shift to hybrid learning, may have influenced the results. These conditions are not representative of traditional or entirely online learning environments.

    • Design and Implementation of Interventions:

      • Limited Scope of Interventions: The study only explored the impact of highlighting keywords within the text. Other potentially impactful interventions, such as interactive multimedia elements or different types of visual aids, were not examined.

      • Single Mode of Enhancement: Focusing solely on visual keyword highlighting might overlook other effective methods of enhancing text comprehension, such as interactive elements or adaptive learning technologies.

    • Measurement and Analysis Limitations:

      • Simplified Metrics: While the study used heatmaps and scanpaths for qualitative analysis, more sophisticated metrics and analyses could provide deeper insights. For example, metrics such as dwell time on specific areas of interest or pupil dilation could offer additional information about cognitive load and engagement.

      • Potential Bias in Self-Reporting: The use of questionnaires to assess students' knowledge and motivations may introduce self-reporting biases, affecting the accuracy of the collected data.

    • Short-Term study

      • Lack of Longitudinal Data: The study was conducted over a single semester. Long-term effects of the interventions on reading comprehension and academic performance were not assessed. Future research should consider longitudinal studies to examine the sustained impact of such methodologies.

    • Generalizability of results

      • Specific Content and Context: The study focused on a particular subject (Logic Systems of Computers) and specific educational materials. Results may differ with other subjects or types of content. Thus, caution should be exercised when generalizing these findings to other educational contexts or disciplines.

By acknowledging these limitations, future research can be better designed to address these issues, improve the robustness of findings, and extend the applicability of scalable eye-tracking methodologies in educational settings.


Overall, the implications of this study highlight the potential of scalable eye-tracking methodologies to enhance our understanding of reading behaviors, improve educational materials, and provide targeted support to students. These advancements can significantly contribute to the field of educational technology and learning sciences, promoting more effective and inclusive learning environments.

The findings of this study have several important implications for educational research and practice:

  • Scalability and accessibility of eye-tracking research

    • Cost-effective methodology: The study demonstrates that using consumer-grade webcams and open-source software can make large-scale eye-tracking studies feasible and affordable. This approach can democratize access to eye-tracking research, allowing more institutions, especially those with limited budgets, to conduct such studies.

    • Remote Data Collection: The ability to collect eye-tracking data remotely without specialized equipment is particularly relevant in the context of increasing online and distance learning scenarios, as evidenced by the COVID-19 pandemic. This method can be applied in various educational settings to monitor and enhance learning processes without geographic constraints.

  • Understanding reading behaviors

    • Differentiation Between Student Performance Levels: The study reveals distinct differences in gaze patterns between high-achieving students and those struggling with reading comprehension. This finding suggests that eye-tracking can be a valuable diagnostic tool to identify students who may need additional support.

    • Limited Impact of Highlighted Keywords: The lack of significant difference in test results between the experimental and control groups indicates that simply highlighting keywords may not substantially enhance reading comprehension for texts that are already well-structured. This suggests that other factors, such as the quality of illustrations and overall text design, might play a more crucial role.

  • Applications in educational design

    • Targeted Interventions: The ability to identify students with reading difficulties through their gaze patterns can help educators develop targeted interventions. For example, students who exhibit less focused gaze patterns might benefit from additional reading strategies or tailored instructional materials.

    • Improving Multimedia Integration: The study underscores the importance of effectively integrating text and visual elements in educational materials. While highlighting keywords did not show a significant impact, the overall design and placement of multimedia elements are crucial for effective learning. This can inform best practices for creating more engaging and comprehensible e-learning content.

  • Future research directions

    • Further exploration of reading strategies: The findings suggest the need for further research into various reading strategies and how different instructional designs can impact comprehension. Future studies could explore more sophisticated multimedia enhancements or different types of text-visual combinations.

    • Longitudinal studies: Conducting longitudinal studies to observe how reading behaviors and comprehension evolve over time with continued use of eye-tracking technology could provide deeper insights into the long-term benefits of such methodologies.

Availability of data and materials

The data will be available after the acceptance of the paper on public repository with You can access the data in the draft version: Alternatively, the data is also available in a public repository at GitHub:



Area of interest




Eye tracking


Expert model of eye movements


Logic systems of computers


Learning management system


Download references


This work was supported by the Scientific Grant Agency of the Ministry of Education of the Slovak Republic (ME SR) and of Slovak Academy of Sciences (SAS) under the contract No. VEGA 1/0385/23.

Author information

Authors and Affiliations



The authors contributed equally to the manuscript.

Corresponding author

Correspondence to Zoltan Balogh.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Turčáni, M., Balogh, Z. & Kohútek, M. Evaluating computer science students reading comprehension of educational multimedia-enhanced text using scalable eye-tracking methodology. Smart Learn. Environ. 11, 29 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: