- Review
- Open access
- Published:
Exploring the application of ChatGPT in ESL/EFL education and related research issues: a systematic review of empirical studies
Smart Learning Environments volume 11, Article number: 50 (2024)
Abstract
ChatGPT, a sophisticated artificial intelligence (AI) chatbot capable of providing personalised responses to users’ inquiries, recently has had a substantial impact on education. Many studies have explored the use of ChatGPT in English as a second language (ESL) and English as a foreign language (EFL) education since its release on 30 November 2022. However, there has been a lack of systematic reviews summarising both the current knowledge and the gaps in this research area. This systematic review analyses 70 empirical studies related to the use of ChatGPT in ESL/EFL education within a 1.5-year period following its release. Using the Technology-based Learning Model, we provide a comprehensive overview of the domains in which ChatGPT has been applied, the methodological approaches, and associated research issues. The included studies collectively provide solid evidence regarding the affordances (e.g., increased learning opportunities, personalised learning, and teacher support) and potential drawbacks (e.g., incorrect information, privacy leakage, and academic dishonesty) of ChatGPT use in ESL/EFL education. However, our findings indicate that the majority of studies have focused on students’ use of this AI tool in writing, while few studies have quantitatively examined its effects on students’ performance and motivation. In addition, the impact of ChatGPT on other language skills, such as reading, speaking, and listening, remains under-researched. Therefore, we recommend that longer-term studies with rigorous research designs (e.g., quasi-experimental designs) and objective data sources (e.g., standardised tests) be conducted to provide more robust evidence regarding the influence of ChatGPT on students’ English language acquisition.
Introduction
On 30 November 2022, OpenAI launched ChatGPT, an artificial intelligence (AI) chatbot. Using natural language processing technologies, ChatGPT interacts with users in real time and provides personalised responses to their queries (OpenAI, 2022). The initial release of ChatGPT was based on the third iteration of the Generative Pre-trained Transformer (GPT-3) series developed by OpenAI. GPT-3 is a significant improvement over its predecessor (GPT-2), featuring an expanded training dataset, enhanced fine-tuning and other capabilities, and the ability to generate even more human-like text (Brown et al., 2020). However, the limitations of its initial release include an inability to process images and the potential to yield inaccurate or false information (Bozkurt et al., 2023; Lo et al., 2024; Tlili et al., 2023). To address these shortcomings, OpenAI introduced an updated version of ChatGPT on 14 March 2023. This release was based on GPT-4, which can process both text and images. OpenAI asserts that GPT-4 has improved the accuracy and overall performance of the tool compared with the initial version (OpenAI, 2023). As of 13 May 2024, the latest version, GPT-4o, includes expanded capabilities for processing text, vision, and even voice conversations (OpenAI, 2024).
Increasingly, ChatGPT has attracted attention in the field of English language education, specifically in the areas of English as a second language (ESL) and English as a foreign language (EFL). The growing popularity of this research topic is reflected in the volume of research published. Within just nine months after its launch, Meniado (2023) found 15 articles related to the use of ChatGPT in ESL/EFL education. His analysis focused on the impact of this tool on English language teaching and learning. In terms of teaching, ChatGPT can support teachers in various aspects, such as lesson planning (Mohamed, 2024), preparation of teaching materials (Jeon et al., 2023), and grading of students’ writing (Mizumoto & Eguchi, 2023). In terms of learning, Meniado (2023) found that ChatGPT facilitated students’ engagement in meaning-focused input, meaning-focused output, language-focused learning, and fluency development—the four crucial components of meaningful and productive English language acquisition (Nation, 2007). Taking fluency development as an example, ChatGPT generated dialogues that helped students to practise spoken English and enhance their language proficiency (Young & Shishido, 2023). However, Meniado (2023) also identified concerns regarding the use of ChatGPT in ESL/EFL education, including occasional inaccurate responses and risks to academic integrity. These concerns echo findings from other systematic reviews of ChatGPT research in the education sector (e.g., Imran & Almusharraf, 2023; Lo et al., 2024; Vargas-Murillo et al., 2023).
Although several systematic reviews have explored the application of ChatGPT, remaining knowledge gaps warrant further investigation, particularly in the context of ESL/EFL education (Meniado, 2023). First, these reviews have focused predominantly on health professions (e.g., Garg et al., 2023; Gödde et al., 2023; Sallam, 2023). At the time of writing, only one systematic review written by Meniado (2023) focuses on ESL/EFL education. However, very few relevant articles (n = 15) had been published and could be included in his research synthesis. Meniado (2023) thus acknowledged that the evidence base of his review might not have been robust enough to establish a thorough overview of the application of ChatGPT and its impact in the area of English language teaching and learning. Second, previous reviews generally have focused on analysing the strengths, weaknesses, opportunities, and threats associated with ChatGPT (e.g., Gödde et al., 2023; Lo et al., 2024; Zhang & Tur, 2023). While such analyses are beneficial, a comprehensive review of the key research issues related to the application of this AI tool in ESL/EFL education is lacking.
To inform future studies, it is important to produce a review that includes more studies and thus provides researchers a global perspective on ChatGPT research in ESL/EFL education (Meniado, 2023). The overarching objective of the present systematic review is to summarise both the current knowledge and the gaps in this research area from multiple angles (Hwang & Chang, 2023; Liu & Hwang, 2023). With this objective, we focus on empirical studies related to the application of ChatGPT in ESL/EFL education within a 1.5-year period after its initial release. To enable a multi-dimensional analysis of these studies, the Technology-based Learning Model (Hsu et al., 2012; Hwang & Chang, 2023; Liu & Hwang, 2023) was used as the theoretical framework for research synthesis. Accordingly, the following research questions (RQ1 to RQ3) were posed to guide our review.
-
RQ1: Within a 1.5-year period after its initial release, in which domains of ESL/EFL education was ChatGPT applied?
-
RQ2: Within a 1.5-year period after its initial release, which methodological approaches were employed in the studies of ChatGPT in ESL/EFL education?
-
RQ3: Within a 1.5-year period after its initial release, what research issues related to ChatGPT in ESL/EFL education were identified?
Theoretical framework
In this review, we used the Technology-based Learning Model proposed by Hsu et al. (2012) for our research synthesis. The researchers emphasised that when exploring future development trends in technology-enhanced learning, it is important to review the literature in the categories of (1) application domains, (2) research methods, and (3) research issues. This model has been adopted in various reviews across research areas (e.g., Hwang & Chang, 2023; Liu & Hwang, 2023). In particular, Hwang and Chang (2023) applied this model to explore trends in research on chatbots in education. From the studies included in their review, they found that languages were the learning domains in which chatbots were most frequently applied, followed by engineering and computers. Regarding research methods, the majority of studies included in their review employed a quantitative approach, followed by those using mixed-methods and qualitative approaches. Most importantly, Hwang and Chang (2023) identified research issues worthy of further investigation, such as exploring the use of effective learning designs or strategies with chatbots.
As shown in Fig. 1, the three constructs of the Technology-based Learning Model were adopted to review and analyse the literature on ChatGPT-supported ESL/EFL education. Using the empirical studies (e.g., Bin-Hady et al., 2023; Mizumoto & Eguchi, 2023; Mohamed, 2024; Yan, 2024; Young & Shishido, 2023) included in Meniado’s (2023) review, a preliminary analysis was conducted to establish a foundation for applying the Technology-based Learning Model in our research synthesis. This groundwork enabled our further efforts to retrieve a more comprehensive set of instances regarding ChatGPT application domains, research methods, and research issues across studies.
Application domains
The review of application domains involved an analysis of the study locations, educational contexts (e.g., primary, secondary, and higher education), and learning domains. Taking learning domains as an example, although several studies (e.g., Bin-Hady et al., 2023; Mohamed, 2024) did not focus on specific learning domains, others predominantly addressed the four core English language skills, namely reading, writing, speaking, and listening. For example, Mizumoto and Eguchi (2023) explored the potential of using ChatGPT to evaluate writing, while Yan (2024) investigated students’ feedback-seeking abilities in writing classes. Therefore, both studies fell within the writing domain. Such analysis enhances our understanding of which domains are under-researched, thereby informing the directions of future studies.
Research methods
The review of research methods involved three levels of analysis encompassing study types, research approaches, and data sources. In our preliminary analysis, we identified four study types, namely ChatGPT evaluation, AI detection, human observation, and human intervention (Table 1). In ChatGPT evaluation studies, researchers interacted with ChatGPT and evaluated its performance (e.g., Mizumoto & Eguchi, 2023). In the AI detection studies, researchers tested the use of AI detectors to detect ChatGPT-generated text (e.g., Ibrahim, 2023). Studies involving human participants were classified as either human observation or human intervention studies (Thiese, 2014). A study was deemed observational (e.g., Mohamed, 2024) if data were collected solely to explore the participants’ perspectives, without any attempt to interfere with or alter the measured attributes (i.e., an intervention). Conversely, a study was classified as interventional (e.g., Yan, 2024) if some forms of intervention were conducted under an experimental condition. Human intervention studies were further classified as having a pre-experimental, quasi-experimental, or true experimental design, as defined by Creswell (2012). Second, the research approaches were broadly categorised as qualitative, quantitative, or mixed methods (Creswell, 2009). Third, we summarised the data sources (e.g., surveys and interviews) employed in the empirical studies.
Research issues
To identify areas lacking in research, the research issues pertaining the included studies must be understood. In addition to identifying research gaps, our review of research issues enabled us to group similar studies and then compare and contrast their research findings. Thus, we could consolidate existing knowledge regarding the impact of ChatGPT on ESL/EFL education. For example, Liu and Hwang (2023) identified several major research issues in their review of research on touchscreen mobile devices. These research issues included the impacts of these devices on children’s development, as well as teachers’ and parents’ perceptions of their use. Consequently, their research synthesis allowed the researchers to summarise the key findings of the literature according to different research issues and propose further research topics that warranted follow-up investigation.
Methods
This section first outlines our search strategies, followed by the inclusion and exclusion criteria and the study selection process. We then explain the process of data extraction and analysis.
Search strategies
We selected relevant articles according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (Moher et al., 2009). The final search was conducted on 1 June 2024. Seven electronic databases were searched: (1) Academic Search Ultimate; (2) ACM Digital Library; (3) Education Source Ultimate; (4) ERIC; (5) IEEE Xplore; (6) Scopus; and (7) Web of Science. The search string with Boolean operators was as follows: (ChatGPT OR GPT-4 OR GPT-4o) AND (ESL OR EFL OR (English AND (L2 OR “second language” OR “foreign language”))). This string was applied to each database to search for relevant articles containing either of the keywords in the title, abstract, or keywords. The publication period was specified as December 2022 to May 2024.
Inclusion and exclusion criteria
The search outcomes were filtered using the following inclusion and exclusion criteria:
-
(1)
Topic and subject area: Studies had to focus on the use of ChatGPT and/or its subsequent releases (i.e., GPT-4 or GPT-4o) in ESL/EFL education. Articles that did not focus on the use of ChatGPT, GPT-4, and/or GPT-4o, or were from other subject disciplines were excluded.
-
(2)
Study type: Included studies were required to report empirical research in which data from ChatGPT, AI detectors, and/or human participants were collected and analysed. Articles that did not discuss empirical studies (e.g., reviews, position papers, and editorials without empirical data) were excluded.
-
(3)
Source: Only journal articles, conference papers, and book chapters were reviewed. Other sources of publications were excluded.
-
(4)
Period: This review covered studies published between 1 December 2022 and 31 May 2024 (i.e., within a 1.5-year period after the release of ChatGPT). Articles outside this time frame were not included.
-
(5)
Language: Only English-language articles were included in the review.
Table 2 summarises the inclusion and exclusion criteria applied when selecting the studies.
Study selection
A total of 202 records were retrieved through a database search on 1 June 2024. Duplicate articles were removed, yielding 121 unique records for screening. However, many of the records retrieved were not empirical studies (n = 27) or not related to ESL/EFL education (n = 14) and the use of ChatGPT (n = 6). Thus, they were outside the scope of this review. After reviewing the titles and abstracts, we assessed 74 full-text articles for eligibility. Of these, four articles were excluded because no empirical data were reported. Finally, 70 articles were included in this review. Figure 2 provides an overview of the article selection process.
Data extraction and analysis
We used the theoretical framework described in Sect. "Theoretical framework" to guide our data extraction process and analysis. We extracted the author(s) and year of publication from each article. To address RQ1, which is related to the application domains of ChatGPT, we obtained information on (1a) the geographical locations of the studies, (1b) the educational contexts in which they were conducted, and (1c) the learning domains targeted. To address RQ2, which is related to the research methods, we classified the studies by (2a) type, including ChatGPT evaluation, AI detection, human observation, and human intervention (see Table 1). We also categorised (2b) the research approaches as qualitative, quantitative, or mixed methods and identified (2c) the data sources involved, such as surveys and interviews. The data pertinent to RQ1 and RQ2 were summarised using descriptive statistics to provide an overview of empirical research. To address RQ3, which is related to research issues, we conducted a content analysis to code the studies. Relevant themes were identified and categorised inductively (Braun & Clarke, 2006), without a predetermined coding scheme. The themes that emerged were subject to refinement throughout data analysis. The findings of the included studies within each thematic category were compared and contrasted. This analysis facilitated a deeper understanding of the ways in which ChatGPT has influenced ESL/EFL education, according to the literature.
To ensure the reliability of coding, all the included studies were independently coded by the first and third authors. This dual-coding approach allowed us to calculate inter-rater reliability using the percent-agreement method, a technique recommended by Stemler (2004). Accordingly, we achieved an inter-rater reliability exceeding 90%, indicating a high level of agreement between the two coders. When discrepancies arose, the first and third authors re-examined the articles in question to discuss and resolve their differences. This approach to coding and resolving discrepancies ensured the integrity and accuracy of our data extraction and analysis.
Findings and discussion
The findings pertaining to each research question are then presented and discussed in the subsequent subsections.
RQ1
Within a 1.5-year period after its initial release, in which domains of ESL/EFL education was ChatGPT applied?
The findings of RQ1 are organised and discussed across three aspects: (1a) study locations, (1b) educational contexts, and (1c) learning domains. Table 3 provides an overview of our major findings and their implications for further research.
1a. Study locations
Figure 3 shows that nearly half of the included studies (n = 34; 48.6%) were conducted in East Asia, including China (n = 12), Indonesia (n = 4), South Korea (n = 4), and several other regions. Around one fourth of the included studies (n = 18; 25.7%) were conducted in Middle Eastern regions, such as Iran (n = 6) and Saudi Arabia (n = 6). Ten studies (14.3%) were conducted in European regions and Russia. Three studies (i.e., Mizumoto et al., 2024; Yancey et al., 2023; Yuan et al., 2024) involved ESL learners and/or their work in various locations. Notably, three studies were published in locations where English is a common language, but their participants’ first language was not English. The studies by Escalante et al. (2023) and Lee (2024) involved students who learnt English as a new language at the University of Hawaii in the United States. Another study by Liu et al. (2024b) was conducted at a university in New Zealand but involved Chinese international students for whom English was a foreign language. A potential research direction could involve exploring whether ChatGPT can assist international ESL/EFL students in adapting to learning environments in English-speaking countries. Researchers could investigate whether ChatGPT can provide personalised support for overcoming language barriers, becoming familiar with new cultural settings, and improving academic performance. Such a study would offer insights into the applications of ChatGPT for supporting diverse student populations in higher education.
1b. Educational contexts
Figure 4 shows that the majority of the included studies (n = 47; 67.1%) were conducted in higher education settings. Only two studies were conducted in K–12 educational settings: the studies by Allehyani and Algamdi (2023) and Kim and Park (2023) involved early childhood teachers and primary school students, respectively. This distribution of studies shows that there are unexplored areas in the context of K–12 ESL/EFL education. Accordingly, there is a need for research focusing on the implementation and impact of ChatGPT in early childhood, primary, and secondary education settings.
1c. Learning domains
Figure 5 shows that the majority of the included studies focused on three core English language skills, including writing (n = 29; 41.4%), speaking (n = 5; 7.1%), and reading (n = 2; 2.9%); no studies specifically addressed listening. In addition, we identified other learning domains, namely vocabulary (Malec, 2024; Mugableh, 2024), grammar (Kucuk, 2024), cultural appreciation (Zheng & Stewart, 2024), literature appreciation (Alhammad, 2024), and thinking skills and creativity (Kartal, 2024). The limited or lack of availability of studies on reading and listening indicates a clear need for further research in these learning domains. Investigating how ChatGPT can support the teaching and learning of all four core skills would lead to a more comprehensive understanding of its affordances and limitations in ESL/EFL education.
RQ2
Within a 1.5-year period after its initial release, which methodological approaches were employed in the studies of ChatGPT in ESL/EFL education?
The findings of RQ2 are organised and discussed across two aspects: (2a) study types and (2b) research approaches and data sources. Table 4 provides an overview of our major findings and their implications for further research.
2a. Study types
Figure 6 shows that the included studies were classified into four study types established in Table 1: ChatGPT evaluation (n = 15; 21.4%), AI detection (n = 2; 2.9%), human observation (n = 26; 37.1%), and human intervention (n = 27; 38.6%). Among the 15 studies classified as ChatGPT evaluation, the most common focus was the potential of this AI tool to support teaching and learning in writing (n = 8), followed by speaking (Wang et al., 2023; Young & Shishido, 2023), reading (Shin & Lee, 2023), and vocabulary (Malec, 2024), among others. The two AI detection studies (Alexander et al., 2023; Ibrahim, 2023) both focused on the writing domain. The studies which focused primarily on human participants were classified into approximately equal numbers of observation studies (n = 26) and intervention studies (n = 27). The observation studies largely focused on investigating teachers’ and students’ perspectives on using ChatGPT in ESL/EFL education. The numbers of participants in these studies ranged from four (Marzuki et al., 2023) to 867 (Liu et al., 2024a), M = 150.19, SD = 215.44. In the intervention studies, researchers generally experimented the use of ChatGPT in ESL/EFL classrooms. Pre-experimental designs were the most common among these studies (n = 16), followed by true experimental designs (n = 7) and quasi-experimental designs (n = 5). Notably, these numbers did not add up to 27 (i.e., the total number of intervention studies) because Escalante et al. (2023) reported two sub-studies in their article. The numbers of participants in the intervention studies ranged from three (Yan, 2024) to 213 (Han et al., 2023), M = 50.44, SD = 43.07. The durations of these studies varied and could be categorised as a few learning tasks or sessions (n = 7), an interval of one month or four weeks (n = 4), five to 10 weeks (n = 11), or longer than 10 weeks or one semester (n = 6). In general, these studies had short durations. Longer-term studies are required to provide further insights into the effects of consistent interaction with ChatGPT on students’ language acquisition and its sustained impact on learning behaviour.
2b. Research approaches and data sources
The quantitative approach was used by the majority of the included studies (n = 26; 37.1%), followed by the qualitative (n = 23; 32.9%) and mixed methods approaches (n = 21; 30.0%). Figure 7 shows the data sources used in the included studies. We first explicated three data sources that emerged specifically in the context of ChatGPT research, namely ChatGPT output (n = 17), AI detector output (n = 2), and user screen recordings of interactions with ChatGPT (n = 2). The use of these data sources was closely related to the study type. Specifically, studies focusing on ChatGPT evaluation collected and analysed ChatGPT output, while those examining AI detection collected and analysed both ChatGPT and AI detector output. The use of user screen recordings is particularly noteworthy. For example, Üstünbaş (2024) used such data and a stimulated-recall interview approach, which enabled users to comment on their experiences as they interacted with ChatGPT. This approach could provide valuable insights into how different users employ ChatGPT as a virtual partner in English language learning.
As shown in Fig. 7, the two most common data sources were participants’ self-reported data, namely surveys (n = 34) and interviews (n = 26). Other types of self-reported data included user journals (n = 6), students’ verbal feedback (n = 1), and online discussions (n = 1). Comparatively, few studies collected and analysed data from objective measures, such as tests (n = 10), participants’ work (n = 4), and observations (n = 2). Future research should incorporate more objective data sources to provide a balanced and comprehensive understanding of the impact of ChatGPT on students’ English language acquisition. For example, standardised tests, student work, and direct observations could be included to increase the robustness of research evidence and complement self-reported data.
RQ3
Within a 1.5-year period after its initial release, what research issues related to ChatGPT in ESL/EFL education were identified?
The research issues identified in the included studies were classified into four major themes. Two themes were specifically related to core English language skills: (3a) writing (n = 30) and (3b) speaking (n = 5). The other two themes were the general perspectives of (3c) teachers (n = 14) and (3d) students (n = 11) regarding the role of ChatGPT in ESL/EFL education. In addition, we identified several (3e) other research issues that had not been explored extensively. The findings of RQ3 are thus organised and discussed across these five areas. Table 5 provides an overview of our major findings and their implications for further research. Table 6 summarises the themes and subthemes of the research issues that emerged from the included studies.
3a. Research issue 1: Writing (n = 30)
Over 40% (n = 30) of the included studies addressed research issues related to writing. Nineteen studies focused on how ChatGPT influenced the teaching and learning of writing, seven examined the use of ChatGPT in assessments of student writing, two evaluated ChatGPT’s capabilities in writing, and two investigated methods used to detect ChatGPT-generated writing. Regarding the teaching and learning of writing, multiple studies provided evidence that ChatGPT could assist students with generating ideas and materials for consultation (Al-Obaydi et al., 2023; Mahapatra, 2024; Nugroho et al., 2024; Üstünbaş, 2024), organisation and structure (Lee, 2024; Mahapatra, 2024; Nugroho et al., 2024; Tsai et al., 2024), spelling and grammar (Lee, 2024; Mahapatra, 2024; Nugroho et al., 2024; Tseng & Lin, 2024), and vocabulary and word choice (Lee, 2024; Nugroho et al., 2024; Tsai et al., 2024; Üstünbaş, 2024). As students noted, “it guides us in obtaining the required information, arranging our ideas, and writing correctly” and “explains the grammar issues when asked” (Mahapatra, 2024, p. 9). Similar to the present review, Meniado (2023) found that ChatGPT’s ability to help students notice and correct errors, along with its support in organising ideas and adhering to genre-specific structures, contributed to improved writing performance. Both sets of findings highlight the potential of ChatGPT to scaffold the writing process.
However, the findings associated with ChatGPT-assisted writing were not overwhelmingly positive. Echoed with Meniado (2023), students in several studies expressed dissatisfaction with ChatGPT for various reasons, including inaccuracy (Hieu & Thao, 2024; Nugroho et al., 2024; Yuan et al., 2024), technical problems (Hieu & Thao, 2024), and an inability to provide desirable responses (Han, 2023; Yan, 2024). One student described it as “a powerful yet demanding tool with diversified and unpredictable outcomes” (Yan, 2024, p. 11). Ahmed (2023) reported that a majority of students in an EFL writing class were dissatisfied with ChatGPT. The students lamented that the opportunities for interaction were more frequent and satisfying in a teacher-mediated writing class. Although ChatGPT can supplement the teaching and learning of writing, it cannot fulfil the role of a teacher (Ahmed, 2023; Escalante et al., 2023; Üstünbaş, 2024).
Table 7 shows that six included studies (Boudouaia et al., 2024; Escalante et al., 2023; Ghafouri et al., 2024; Mahapatra, 2024; Silitonga et al., 2023; Song & Song, 2023) quantitatively compared ChatGPT-assisted writing (experimental condition) with traditional classroom instruction (control condition). These studies focused on students’ writing performance and/or motivation. Except for the study by Escalante et al. (2023), the study results generally indicated that students in the experimental groups significantly outperformed those in the control groups (Table 7). The study by Song and Song (2023) further provided a breakdown of students’ writing performance, showing significant improvements in their experimental group in terms of content, organisation, and language use. Similarly, Boudouaia et al. (2024) reported significant improvements in students’ task achievement, coherence and cohesion, grammatical range and accuracy, and lexical range and accuracy. Regarding students’ writing motivation, Silitonga et al. (2023) and Song and Song (2023) reported that students in the experimental groups had higher levels of motivation than those in the control groups (Table 7). However, too few studies were available to conduct a meta-analysis of the overall effect of ChatGPT on students’ writing performance and motivation. Furthermore, all comparison studies were conducted in higher education settings. Therefore, these results might not be generalisable to other contexts, such as primary and secondary education. Clearly, more comparison studies are needed in both higher and K–12 education settings.
In addition to the influence of ChatGPT, research has provided insights into the role of this AI tool in supporting ESL/EFL teachers’ assessments of student writing (n = 7). Some of its functionalities (i.e., automated scorer and feedback provider) were also identified in the review by Meniado (2023). Mizumoto and Eguchi (2023) and Mizumoto et al. (2024) demonstrated that the accuracy and reliability of ChatGPT-based automated essay scoring could complement human evaluations. Besides, it was able to distinguish between native and non-native English sentences and provide suggestions for correction (Cho, 2023). Yancey et al. (2023) provided further evidence suggesting that GPT-4 could achieve a nearly optimal writing evaluation performance. However, Algaraady and Mahyoob (2023) cautioned that while ChatGPT excelled at identifying surface-level errors, it struggled to detect deeper structural and pragmatic issues. The researchers therefore emphasised the irreplaceability of human teachers. Similarly, Obata et al. (2023) noted the challenges associated with relying solely on AI models for writing assessment. Regarding feedback on writing, Guo and Wang (2024) observed that ChatGPT generated more feedback than human teachers. Furthermore, this feedback was distributed evenly across the content, organisation, and language aspects of writing and could potentially lessen teachers’ burden of writing feedback. However, Guo and Wang (2024) also revealed teachers’ concerns about the length, readability, and relevance of ChatGPT-generated feedback. These studies collectively highlight the importance of integrating both AI and human expertise into ESL/EFL education to effectively evaluate and provide feedback on writing.
Regarding ChatGPT’s capabilities in writing, Wang (2023) and Zindela (2023) conducted a syntactic complexity analysis of ChatGPT-revised and ChatGPT-generated essays, respectively. Wang (2023) tasked ChatGPT with revising students’ essays and found that the ChatGPT-revised essays better matched the characteristics of high-level argumentative writing compared to students’ original essays. Similarly, Zindela (2023) found that ChatGPT-generated essays used more sophisticated and varied vocabulary compared to students’ argumentative writing. These results indicate the potential use of ChatGPT to support the teaching and learning of writing by improving the quality of students’ essays and providing high-quality sample essays.
Regarding the detection of ChatGPT-generated writing (n = 2), Ibrahim (2023) examined the effectiveness of two AI-detection platforms in identifying machine-generated text within a dataset of 240 human-written and ChatGPT-generated essays. His findings revealed that while both detectors could identify AI-generated content, they performed inconsistently across the dataset. The study highlighted the need for a more reliable detection mechanism to address AI-assisted plagiarism in the context of ESL/EFL education. In addition to AI detector capabilities, Alexander et al. (2023) investigated the challenges faced by ESL teachers when identifying ChatGPT-generated texts. They found that the teachers often lacked awareness of the characteristics and metrics used by ChatGPT and did not focus enough on fact-checking the content. Their findings highlighted teachers’ needs for enhanced digital and AI literacy, professional development (PD), advanced detection tools, and updated assessment policies to maintain academic integrity, as also emphasised by other researchers (e.g., Bozkurt et al., 2023; Meniado, 2023; Tlili et al., 2023).
3b. Research issue 2: Speaking (n = 5)
Although few studies addressed speaking, these studies covered three areas, namely the use of ChatGPT in generating dialogue materials (n = 2), its role as a learning partner (n = 2), and its use as an assessment tool (n = 1). First, Young and Shishido (2023) examined the effectiveness of ChatGPT in terms of generating dialogue materials suitable for EFL students. They concluded that the materials were suitable for students in primary education settings. Kim and Park (2023) compared students’ perceptions of role-playing scripts derived from textbooks with those of scripts generated by ChatGPT. Their students consistently rated the ChatGPT-generated scripts as more interesting than the book-derived scripts. While both studies highlighted the potential use of ChatGPT-generated materials for practising speaking in primary education settings, further research is needed to explore its efficacy in other educational contexts, such as secondary and higher education.
Second, two studies explored the potential use of ChatGPT as a partner in learning to speak English. Muniandy and Selvanathan (2024) instructed their students to use ChatGPT to simulate various roles (e.g., TED speakers) when developing persuasive speeches, generating outlines and presentation slides, and comparing these with their own work. Although ChatGPT did not support voice conversations at the time of that study, the students used an extension called ChatGPT Voice Master from the Chrome Web Store to address this limitation. Muniandy and Selvanathan (2024) found that ChatGPT boosted students’ confidence and speaking skills. However, inaccurate information, difficulty in using the correct prompts, and technical issues were the major challenges encountered when using ChatGPT. In another study, Lee et al. (2023) integrated ChatGPT into Augmented Reality glasses to enhance students’ speaking skills and provide a contextual language learning experience. This integrated approach led to improvements in students’ perceptions of task competence and aesthetic appeal compared with traditional English language learning.
Third, Wang et al. (2023) developed new ways to use ChatGPT to evaluate how well ESL learners placed pauses in their speech. Their results indicated that ChatGPT partially understood punctuation breaks but tended to overlook slight pauses between semantic groups. Nevertheless, Wang et al. (2023) recognised the potential of this AI tool in speech assessment and encouraged further exploration of strategies for optimising prompt design to enhance its performance.
3c. Research issue 3: Teachers’ general perspectives (n = 14)
Solid evidence indicates that ESL/EFL teachers’ general perspectives of ChatGPT comprised a mixture of affordances and concerns (Allehyani & Algamdi, 2023; Derakhshan & Chiasvand, 2024; Mabuan, 2024; Mohamed, 2024; Ulum, 2024). We discussed three major affordances (i.e., increased learning opportunities, personalised learning, and teacher support) identified in the included studies. First, teachers recognised that ChatGPT could increase ESL/EFL students’ opportunities to practise their language in real time (Allehyani & Algamdi, 2023; Mohamed, 2024; Ulla et al., 2023). One teacher stated that “ChatGPT may allow students to have the opportunity to actively formulate questions, request further explanations, and produce replies, which may foster an active engagement in practicing their language skills, resulting in enhanced language proficiencies” (Ulla et al., 2023, p. 175). Second, ChatGPT could offer personalised learning experiences by tailoring content to a student’s proficiency level (Alenizi et al., 2023; Mohamed, 2024; Yeh, 2024). In the study by Alenizi et al. (2023), for example, teachers confirmed that “ChatGPT can provide personalized learning experiences for special education students” (p. 17) and that it “adapts to the student’s learning style, pace, and need” (p. 18). Third, ChatGPT could assist teachers in generating and refining lesson plans, exercises, and activities based on specific learning objectives in ESL/EFL classrooms (Alenizi et al., 2023; Farzaneh, 2024b; Ulla et al., 2023; Yeh, 2024). These findings aligned with those of Meniado (2023), who also identified ChatGPT’s potential to serve as both a lesson planner and an instructional material developer. In the study by Yeh (2024), for example, teachers used ChatGPT not only to create vocabulary exercises but also to refine educational song lyrics and align them with the lesson objectives, thus making instructional materials more accessible to their students.
Despite these affordances, the included studies collectively pinpointed five major concerns regarding ChatGPT, namely the occasional provision of incorrect information (Gao et al., 2024; Mohamed, 2024), privacy leakage when using it (Gao et al., 2024; Mohamed, 2024), academic dishonesty (Cong-Lem et al., 2024; Hieu & Thao, 2024), students’ over-reliance on this AI tool (Cong-Lem et al., 2024; Dehghan, 2024a), and hindered real-life communication (Alenizi et al., 2023; Mohamed, 2024). The first three concerns have been well documented in the review by Meniado (2023) as well as previous reviews of ChatGPT research (see Gödde et al., 2023; Imran & Almusharraf, 2023; Lo et al., 2024; Mohamed, 2024; Zhang & Tur, 2023 for a review). Regarding students’ over-reliance, across the included studies (Cong-Lem et al., 2024; Dehghan, 2024a; Gao et al., 2024; Mohamed, 2024; Ulla et al., 2023), teachers expressed concerns that students might become overly dependent on AI assistance instead of developing their language skills. This over-reliance could also impair students’ development of critical thinking skills (Cong-Lem et al., 2024; Mohamed, 2024; Ulum, 2024) and creativity (Cong-Lem et al., 2024; Dehghan, 2024a; Derakhshan & Ghiasvand, 2024). Regarding the hindrance of real-life communication, ChatGPT could not provide the same level of nonverbal cues as a human (Alenizi et al., 2023). As one teacher noted, “ChatGPT may become a crutch, hindering effective communication in real-life situations, and may not capture the nuances of human interaction” (Mohamed, 2024, p. 3206). However, follow-up studies are required because the latest release of GPT-4o supports voice conversations (OpenAI, 2024), potentially enhancing the effectiveness of real-life communication.
Finally, teachers expressed that they would need training to effectively integrate ChatGPT into their teaching practices (Alenizi et al., 2023; Allehyani & Algamdi, 2023; Mabuan, 2024). In efforts to inform teachers’ PD, Alrishan (2023) and Dehghani and Mashhadi (2024) used the Technology Acceptance Model to investigate pre-service and in-service teachers’ intention to use ChatGPT in ESL/EFL education. Both studies showed that perceived usefulness and ease of use were crucial in shaping teachers’ intention to use ChatGPT. Therefore, PD programmes should ensure that teachers find ChatGPT both useful and easy to use. Through such training, teachers could learn to effectively integrate ChatGPT into their teaching practices, become empowered to evaluate and modify its outputs, and learn strategies to mitigate the potential negative impacts of ChatGPT on ESL/EFL education.
3d. Research issue 4: Students’ general perspectives (n = 11)
Research focused on students’ general perspectives was related to whether they found ChatGPT to be useful, their satisfaction with it, and their motivation to learn English. Across studies (Bin-Hady et al., 2023; Klimova et al., 2024; Liu & Ma, 2024; Liu et al., 2024a; Shaikh et al., 2023; Vo & Nguyen, 2024), students generally expressed that ChatGPT was a useful tool for English language learning. Their perceived usefulness influenced their intention to use ChatGPT for English language learning (Liu et al., 2024a; Xu & Thien, 2024). It scaffolded their learning process by acting as a partner in practising language, providing feedback on language use, and recommending additional activities for practising (Bin-Hady et al., 2023; Liu et al., 2024a). In the study by Klimova et al. (2024), the students shared several useful applications of ChatGPT, including explaining, writing, and copy-editing. Consequently, the student participants in several studies (Klimova et al., 2024; Markus et al., 2023; Shaikh et al., 2023) were generally satisfied with the use of ChatGPT. Furthermore, using ChatGPT could increase students’ motivation to learn English (Markus et al., 2023; Muthmainnah et al., 2024). In their words, “The material provided is easier to understand when interacting with ChatGPT” and “I feel that my motivation to learn English has increased with ChatGPT” (Muthmainnah et al., 2024, p. 34).
Despite its positive influence on their learning experiences, some ESL/EFL students expressed concerns about ChatGPT. Most of these concerns mirrored those of teachers, including issues with information accuracy, academic dishonesty, and over-reliance on ChatGPT (Klimova et al., 2024; Marjanovikj-Apostolovski, 2024; Xiao & Zhi, 2023). In addition, some students reported that it was difficult to obtain a desirable output from ChatGPT (Liu et al., 2024a; Xiao & Zhi, 2023). Notably, this challenge might have stemmed from both ChatGPT’s limited understanding of the students’ input and the students’ lack of knowledge regarding the use of appropriate prompts. As one EFL student of computer science explained, “ChatGPT still has many limitations, especially in terms of understanding the users’ input. People need to give appropriate prompts so that they can find it enjoyable and effective to use ChatGPT” (Liu et al., 2024a, p. 16). Therefore, training should be provided to improve students’ ability to formulate effective prompts for eliciting responses from ChatGPT and their understanding of its input-processing limitations.
3e. Other research issues (n = 10)
As shown in Table 6, we also identified several unexplored research issues that may hold implications for a broader understanding and application of ChatGPT in ESL/EFL education. The following highlights the major findings from individual studies in three learning domains, namely reading, vocabulary, and grammar.
-
Reading: Rees and Lew (2024) investigated the effectiveness of ChatGPT-generated definitions in helping students resolve uncertainties about vocabulary during a reading task. They found no significant difference in performance between students who used the AI materials and those who used definitions from the Macmillan English Dictionary. Shin and Lee (2023) compared ChatGPT-generated reading comprehension tests with those created by human experts. In terms of naturalness, they found that the flow and expressions in the AI-generated materials were comparable to those in human-created materials. However, the expert-created reading passages and test items appeared to be superior in terms of attractiveness and completeness.
-
Vocabulary: Mugableh (2024) explored the potential use of ChatGPT to create vocabulary exercises. His findings indicated that students using ChatGPT-generated exercises significantly outperformed those using traditional exercises. However, Malec (2024) discovered that ChatGPT’s performance in generating distractors for multiple-choice vocabulary questions was unsatisfactory.
-
Grammar: Kucuk (2024) examined the effectiveness of integrating ChatGPT in the teaching and learning of grammar. The results of his grammar test indicated that students with ChatGPT support scored significantly higher than those without.
Conclusion and limitations
This systematic review analysed 70 empirical studies related to the use of ChatGPT in ESL/EFL education within a 1.5-year period after its initial release. Compared to the previous review by Meniado (2023), there is a substantial increase in the volume of relevant research in recent months. This growing trend in research is likely to continue. However, researchers should first identify gaps in the fast-growing literature to avoid overlooking previous efforts in this research area.
Using the Technology-based Learning Model, we provided an overview of the application domains, methodological approaches, and research issues that have emerged from research on ChatGPT in ESL/EFL education. We found that the majority of existing studies addressed the writing domain. However, the effect of ChatGPT use in writing courses remains under-evaluated. Very few comparison studies (e.g., ChatGPT-supported vs. traditional approaches) have been conducted, which has hindered the use of a meta-analytical approach to summarising the influence of this AI tool on students’ writing performance and motivation. The efficacy of ChatGPT in supporting the teaching and learning of other language skills (i.e., reading, speaking, and listening) is also under-researched. Therefore, we recommend that further studies with more rigorous research designs, such as quasi-experimental and true experimental designs, should be conducted to explore these areas. It also will be necessary to include more objective data sources (e.g., standardised tests and student work) to offer more robust research evidence. In light of the rapid advancements in AI technology, the capabilities of ChatGPT are likely to have improved further since the time of writing. Future research should continue to evaluate the evolving capabilities and potential affordances and concerns associated with the use of ChatGPT in ESL/EFL education.
Finally, several limitations of this review must be acknowledged. First, our research synthesis was constrained by the information reported by the study authors. The absence of an entity (e.g., data source) or a theme (e.g., participants’ perceptions) did not necessarily imply the absence of a specific category. Instead, it indicated only that the authors did not explicitly report such information in their articles. Second, although we summarised findings regarding major research issues, these findings were primarily based on studies conducted in higher education settings. Therefore, some findings of this review might not be generalisable to other educational contexts, such as primary and secondary education. Further studies are required to investigate the impact of ChatGPT on ESL/EFL education in K–12 settings. Third, most of the empirical studies included in this review were of short duration, which may have led to a novelty effect. Longer-term studies are necessary to determine whether the impact of ChatGPT on students’ English language acquisition is sustainable.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
References
*References marked with an asterisk indicate studies included in the review.
*Ahmed, M. A. (2023). ChatGPT and the EFL classroom: Supplement or substitute in Saudi Arabia’s eastern region. Information Sciences Letters, 12(7), 2727–2734. https://doi.org/10.18576/isl/120704
*Alenizi, M. A. K., Mohamed, A. M., & Shaaban, T. S. (2023). Revolutionizing EFL special education: How ChatGPT is transforming the way teachers approach language learning. Innoeduca: International Journal of Technology and Educational Innovation, 9(2), 5–23. https://doi.org/10.24310/innoeduca.2023.v9i2.16774
*Alexander, K., Savvidou, C., & Alexander, C. (2023). Who wrote this essay? Detecting AI-generated writing in second language education in higher education. Teaching English with Technology, 23(2), 25–43. https://doi.org/10.56297/BUKA4060/XHLD5365
*Algaraady, J., & Mahyoob, M. (2023). ChatGPT’s capabilities in spotting and analyzing writing errors experienced by EFL learners. Arab World English Journals, Special Issue on CALL, 9, 3–17. https://doi.org/10.24093/awej/call9.1
*Alhammad, A. I. (2024). The impact of ChatGPT in developing Saudi EFL learners’ literature appreciation. World Journal of English Language, 14(2), 331–338. https://doi.org/10.5430/wjel.v14n2p331
*Allehyani, S. H., & Algamdi, M. A. (2023). Digital competences: Early childhood teachers’ beliefs and perceptions of ChatGPT application in teaching English as a Second Language (ESL). International Journal of Learning, Teaching and Educational Research, 22(11), 343–363. https://doi.org/10.26803/ijlter.22.11.18
*Al-Obaydi, L. H., Pikhart, M., & Klimova, B. (2023). ChatGPT and the general concepts of education: Can artificial intelligence-driven chatbots support the process of language learning? International Journal of Emerging Technologies in Learning, 18(21), 39–50. https://doi.org/10.3991/ijet.v18i21.42593
*Alrishan, A. M. H. (2023). Determinants of intention to use ChatGPT for professional development among Omani EFL pre-service teachers. International Journal of Learning, Teaching and Educational Research, 22(12), 187–209. https://doi.org/10.26803/ijlter.22.12.10
*Bin-Hady, W. R. A., Al-Kadi, A., Hazaea, A., & Ali, J. K. M. (2023). Exploring the dimensions of ChatGPT in English language learning: A global perspective. Library Hi Tech. https://doi.org/10.1108/LHT-05-2023-0200
*Boudouaia, A., Mouas, S., & Kouider, B. (2024). A study on ChatGPT-4 as an innovative approach to enhancing English as a foreign language writing learning. Journal of Educational Computing Research. https://doi.org/10.1177/07356331241247465
Bozkurt, A., Junhong, X., Lambert, S., Pazurek, A., Crompton, H., Koseoglu, S., Farrow, R., Bond, M., Nerantzi, C., Honeychurch, S., Bali, M., Dron, J., Mir, K., Stewart, B., Costello, E., Mason, J., Stracke, C. M., & Romero-Hall, E. (2023). Speculative futures on ChatGPT and generative artificial intelligence (AI): A collective reflection from the educational landscape. Asian Journal of Distance Education, 18(1), 53–130. https://doi.org/10.5281/zenodo.7636568
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., & Amodei, D. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 1877–1901). Information Processing Systems Foundation, Inc.
*Cho, H. (2023). Analyzing ChatGPT’s judgments on nativelikeness of sentences written by English native speakers and Korean EFL learners. Multimedia-Assisted Language Learning, 26(2), 9–32. https://doi.org/10.15702/mall.2023.26.2.9
*Cong-Lem, N., Tran, T. N., & Nguyen, T. T. (2024). Academic integrity in the age of generative AI: Perceptions and responses of Vietnamese EFL teachers. Teaching English with Technology, 24(1), 28–47. https://doi.org/10.56297/FSYB3031/MXNB7567
Creswell, J. W. (2009). Research design: Qualitative, and quantitative, and mixed methods approaches. SAGE Publications Inc.
Creswell, J. W. (2012). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (4th ed.). Pearson.
*Dehghan, F. (2024a). Demystifying the unknown: ChatGPT and foreign language classrooms in the voices of EFL teachers. In Z. Ç. Köroğlu & A. Çakır (Eds.), Fostering foreign language teaching and learning environments with contemporary technologies (pp. 70–90). IGI Global. https://doi.org/10.4018/979-8-3693-0353-5.ch004
*Dehghan, F. (2024b). The use of AI by EFL teachers to address individual differences: A case study. In T. Q. Tran & T. M. Doung (Eds.), Addressing issues of learner diversity in english language education (pp. 149–162). IGI Global. https://doi.org/10.4018/979-8-3693-2623-7.ch009
*Dehghani, H., & Mashhadi, A. (2024). Exploring Iranian English as a foreign language teachers’ acceptance of ChatGPT in English language teaching: Extending the technology acceptance model. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12660-9
*Derakhshan, A., & Ghiasvand, F. (2024). Is ChatGPT an evil or an angel for second language education and research? A phenomenographic study of research-active EFL teachers’ perceptions. International Journal of Applied Linguistics. https://doi.org/10.1111/ijal.12561
*Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20, 57. https://doi.org/10.1186/s41239-023-00425-2
*Gao, Y., Wang, Q., & Wang, X. (2024). Exploring EFL university teachers’ beliefs in integrating ChatGPT and other large language models in language education: A study in China. Asia Pacific Journal of Education, 44(1), 29–44. https://doi.org/10.1080/02188791.2024.2305173
Garg, R. K., Urs, V. L., Agarwal, A. A., Chaudhary, S. K., Paliwal, V., & Kar, S. K. (2023). Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promotion Perspectives, 13(3), 183–191. https://doi.org/10.34172/hpp.2023.22
*Ghafouri, M. (2024). ChatGPT: The catalyst for teacher-student rapport and grit development in L2 class. System, 120, 103209. https://doi.org/10.1016/j.system.2023.103209
*Ghafouri, M., Hassaskhah, J., & Mahdavi-Zafarghandi, A. (2024). From virtual assistant to writing mentor: Exploring the impact of a ChatGPT-based writing instruction protocol on EFL teachers’ self-efficacy and learners’ writing skill. Language Teaching Research. https://doi.org/10.1177/13621688241239764
Gödde, D., Nöhl, S., Wolf, C., Rupert, Y., Rimkus, L., Ehlers, J., Breuckmann, F., & Sellmann, T. (2023). A SWOT (strengths, weaknesses, opportunities, and threats) analysis of ChatGPT in the medical literature: Concise review. Journal of Medical Internet Research, 25, e49368. https://doi.org/10.2196/49368
*Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
*Han, J., Yoo, H., Kim, Y., Myung, J., Kim, M., Lim, H., Kim, J., Lee, T. Y., Hong, H., Ahn, S. Y., & Oh, A. (2023). RECIPE: How to integrate ChatGPT into EFL writing education. In Proceedings of the Tenth ACM Conference on Learning @ Scale (pp. 416–420). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3573051.3596200
*Hieu, H. H. (2024). Exploring the impact of AI in language education: Vietnamese EFL teachers’ views on using ChatGPT for fairy tale retelling tasks. International Journal of Learning, Teaching and Educational Research, 23(3), 486–503. https://doi.org/10.26803/ijlter.23.3.24
Hsu, Y. C., Ho, H. N. J., Tsai, C. C., Hwang, G. J., Chu, H. C., Wang, C. Y., & Chen, N. S. (2012). Research trends in technology-based learning from 2000 to 2009: A content analysis of publications in selected journals. Educational Technology & Society, 15(2), 354–370.
Hwang, G. J., & Chang, C. Y. (2023). A review of opportunities and challenges of chatbots in education. Interactive Learning Environments, 31(7), 4099–4112. https://doi.org/10.1080/10494820.2021.1952615
*Ibrahim, K. (2023). Using AI-based detectors to control AI-assisted plagiarism in ESL writing: “The terminator versus the machines.” Language Testing in Asia, 13, 46. https://doi.org/10.1186/s40468-023-00260-2
Imran, M., & Almusharraf, N. (2023). Analyzing the role of ChatGPT as a writing assistant at higher education level: A systematic review of the literature. Contemporary Educational Technology, 15(4), ep464. https://doi.org/10.30935/cedtech/13605
Jeon, J., Lee, S., & Choi, S. (2023). A systematic review of research on speech-recognition chatbots for language learning: Implications for future directions in the era of large language models. Interactive Learning Environments. https://doi.org/10.1080/10494820.2023.2204343
*Kartal, G. (2024). The influence of ChatGPT on thinking skills and creativity of EFL student teachers: A narrative inquiry. Journal of Education for Teaching. https://doi.org/10.1080/02607476.2024.2326502
*Kim, S., & Park, S. H. (2023). Young Korean EFL learners’ perception of role-playing scripts: ChatGPT vs. Textbooks. Korean Journal of English Language and Linguistics, 23, 1136–1153. https://doi.org/10.15738/kjell.23.202312.1136
*Klimova, B., Pikhart, M., & Al-Obaydi, L. H. (2024). Exploring the potential of ChatGPT for foreign language education at the university level. Frontiers in Psychology, 15, 1269319. https://doi.org/10.3389/fpsyg.2024.1269319
*Kucuk, T. (2024). ChatGPT integrated grammar teaching and learning in EFL classes: A study on Tishk international university students in Erbil, Iraq. Arab World English Journal, Special Issue on ChatGPT, 100–111. https://doi.org/10.24093/awej/ChatGPT.6
*Lee, H., Hsia, C. C., Tsoy, A., Choi, S., Hou, H., & Ni, S. (2023). VisionARy: Exploratory research on contextual language learning using AR glasses with ChatGPT. In Proceedings of the 15th Biannual Conference of the Italian SIGCHI Chapter, Article No. 22 (pp. 1–6). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3605390.3605400
*Lee, S. (2024). Utilizing a ChatGPT workshop to foster ethical awareness and enhance L2 English writing revision processes in university academic settings: ChatGPT workshop for effective and ethical L2 English writing. In F. Pan (Ed.), AI in language teaching, learning, and assessment (pp. 269–299). IGI Global. https://doi.org/10.4018/979-8-3693-0872-1.ch013
Liu, C., & Hwang, G. J. (2023). Roles and research trends of touchscreen mobile devices in early childhood education: Review of journal publications from 2010 to 2019 based on the technology-enhanced learning model. Interactive Learning Environments, 31(3), 1683–1702. https://doi.org/10.1080/10494820.2020.1855210
*Liu, G. L., Darvin, R., & Ma, C. (2024a). Exploring AI-mediated informal digital learning of English (AI-IDLE): a mixed-method investigation of Chinese EFL learners’ AI adoption and experiences. Computer Assisted Language Learning. https://doi.org/10.1080/09588221.2024.2310288
*Liu, G., & Ma, C. (2024). Measuring EFL learners’ use of ChatGPT in informal digital learning of English based on the technology acceptance model. Innovation in Language Learning and Teaching, 18(2), 125–138. https://doi.org/10.1080/17501229.2023.2240316
*Liu, M., Zhang, L. J., & Biebricher, C. (2024b). Investigating students’ cognitive processes in generative AI-assisted digital multimodal composing and traditional writing. Computers & Education, 211, 104977. https://doi.org/10.1016/j.compedu.2023.104977
Lo, C. K., Hew, K. F., & Jong, M. S. Y. (2024). The influence of ChatGPT on student engagement: A systematic review and future research agenda. Computers & Education, 219, 105100. https://doi.org/10.1016/j.compedu.2024.105100
*Mabuan, R. A. (2024). ChatGPT and ELT: Exploring teachers’ voices. International Journal of Technology in Education, 7(1), 128–153. https://doi.org/10.46328/ijte.523
*Mahapatra, S. (2024). Impact of ChatGPT on ESL students’ academic writing skills: A mixed methods intervention study. Smart Learning Environments, 11, 9. https://doi.org/10.1186/s40561-024-00295-9
*Malec, W. (2024). Investigating the quality of AI-generated distractors for a multiple-choice vocabulary test. In Proceedings of the 16th International Conference on Computer Supported Education, vol. 1 (pp. 836–843). SciTePress. https://doi.org/10.5220/0012762400003693
*Marjanovikj-Apostolovski, M. (2024). ChatGPT as a learning tool in an undergraduate advanced academic English course at South East European university. Journal of Teaching English for Specific and Academic Purposes, 12(1), 243–254. https://doi.org/10.22190/JTESAP240131020M
*Markus, A. M., Ovinova, L. N., Dmitrusenko, I. N., & Shraiber, E. G. (2023). Application of artificial intelligence technology in teaching English language to engineering bachelors. In Proceedings of the 2023 International Conference on Quality Management, Transport and Information Security, Information Technologies (pp. 147–151). IEEE. https://doi.org/10.1109/ITQMTIS58985.2023.10346594
*Marzuki Widiati, U., Rusdin, D., & Indrawati, I. (2023). The impact of AI writing tools on the content and organization of students’ writing: EFL teachers’ perspective. Cogent Education, 10(2), 2236469. https://doi.org/10.1080/2331186X.2023.2236469
Meniado, J. C. (2023). The impact of ChatGPT on English language teaching, learning, and assessment: A rapid review of literature. Arab World English Journal, 14(4), 3–18. https://doi.org/10.24093/awej/vol14no4.1
*Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
*Mizumoto, A., Shintani, N., Sasaki, M., & Teng, M. F. (2024). Testing the viability of ChatGPT as a companion in L2 writing accuracy assessment. Research Methods in Applied Linguistics, 3(2), 100116. https://doi.org/10.1016/j.rmal.2024.100116
*Mohamed, A. M. (2024). Exploring the potential of an AI-based chatbot (ChatGPT) in enhancing English as a foreign language (EFL) teaching: Perceptions of EFL faculty members. Education and Information Technologies, 29(3), 3195–3217. https://doi.org/10.1007/s10639-023-11917-z
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., the PRISMA Group. (2009). Reprint-preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Physical Therapy, 89(9), 873–880. https://doi.org/10.1093/ptj/89.9.873
*Mugableh, A. I. (2024). The impact of ChatGPT on the development of vocabulary knowledge of Saudi EFL students. Arab World English Journal, Special Issue on ChatGPT, 265–281. https://doi.org/10.24093/awej/ChatGPT.18
*Muniandy, J., & Selvanathan, M. (2024). ChatGPT, a partnering tool to improve ESL learners’ speaking skills: Case study in a public university, Malaysia. Teaching Public Administration. https://doi.org/10.1177/01447394241230152
*Muthmainnah, M., Apriani, E., Seraj, P. M. I., Obaid, A. J., & AlYakin, A. M. (2024). Nudging motivation to learn English through a ChatGPT smartphone-based hybrid model. In A. J. Obaid, B. Bhushan, S. Muthmainnah, & S. S. Rajest (Eds.), Advanced Applications of Generative AI and Natural Language Processing Models (pp. 26–42). IGI Global. https://doi.org/10.4018/979-8-3693-0502-7.ch002
Nation, P. (2007). The four strands. International Journal of Innovation in Language Learning and Teaching, 1(1), 2–13. https://doi.org/10.2167/illt039.0
*Nugroho, A., Andriyanti, E., Widodo, P., & Mutiaraningrum, I. (2024). Students’ appraisals post-ChatGPT use: Students’ narrative after using ChatGPT for writing. Innovations in Education and Teaching International. https://doi.org/10.1080/14703297.2024.2319184
*Obata, A., Tagawa, T., & Ono, Y. (2023). Assessment of ChatGPT’s validity in scoring essays by foreign language learners of Japanese and English. In Proceedings of the 2023 15th International Congress on Advanced Applied Informatics Winter (pp. 105–110). IEEE. https://doi.org/10.1109/IIAI-AAI-WINTER61682.2023.00028
OpenAI (2022). Introducing ChatGPT. Retrieved from https://openai.com/blog/chatgpt
OpenAI (2023). GPT-4. Retrieved from https://openai.com/research/gpt-4
OpenAI (2024). Introducing GPT-4o and more tools to ChatGPT free user. Retrieved from https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/
*Rees, G. P., & Lew, R. (2024). The effectiveness of OpenAI GPT-generated definitions versus definitions from an English learners’ dictionary in a lexically orientated reading task. International Journal of Lexicography, 37(1), 50–74. https://doi.org/10.1093/ijl/ecad030
Sallam, M. (2023). ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare, 11(6), 887. https://doi.org/10.3390/healthcare11060887
*Shaikh, S., Yayilgan, S. Y., Klimova, B., & Pikhart, M. (2023). Assessing the usability of ChatGPT for formal English language learning. European Journal of Investigation in Health, Psychology and Education, 13(9), 1937–1960. https://doi.org/10.3390/ejihpe13090140
*Shin, D., & Lee, J. H. (2023). Can ChatGPT make reading comprehension testing items on par with human experts? Language Learning & Technology, 27(3), 27–40.
*Silitonga, L. M., Hawanti, S., Aziez, F., Furqon, M., Zain, D. S. M., Anjarani, S., & Wu, T. T. (2023). The impact of AI chatbot-based learning on students’ motivation in English writing classroom. In Y. M. Huang, & T. Rocha (Eds.), Innovative Technologies and Learning. ICITL 2023. Lecture Notes in Computer Science, vol. 14099 (pp. 542–549). Switzerland AG: Springer. https://doi.org/10.1007/978-3-031-40113-8_53
*Song, C., & Song, Y. (2023). Enhancing academic writing skills and motivation: Assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology, 14, 1260843. https://doi.org/10.3389/fpsyg.2023.1260843
Stemler, S. E. (2004). A Comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9, 4. https://doi.org/10.7275/96jp-xz07
*Tang, Z., & Zhang, Y. (2023). Application of generative artificial intelligence in English education: Taking ChatGPT system as an example. In Proceedings of the 2023 3rd International Conference on Educational Technology (pp. 42–46). IEEE. https://doi.org/10.1109/ICET59358.2023.10424297
Thiese, M. S. (2014). Observational and interventional study design types: An overview. Biochemia Medica, 24(2), 199–210. https://doi.org/10.11613/BM.2014.022
Tlili, A., Shehata, B., & Adarkwah, M. A. (2023). What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learning Environments, 10, 15. https://doi.org/10.1186/s40561-023-00237-x
*Tsai, C. Y., Lin, Y. T., & Brown, I. K. (2024). Impacts of ChatGPT-assisted writing for EFL English majors: Feasibility and challenges. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12722-y
*Tseng, Y. C., & Lin, Y. H. (2024). Enhancing English as a foreign language (EFL) learners’ writing with ChatGPT: A university-level course design. Electronic Journal of e-Learning, 22(2), 78–97. https://doi.org/10.34190/ejel.21.5.3329
*Ulla, M. B., Perales, W. F., & Busbus, S. O. (2023). ‘To generate or stop generating response’: Exploring EFL teachers’ perspectives on ChatGPT in English language teaching in Thailand. Learning: Research and Practice, 9(2), 168–182. https://doi.org/10.1080/23735082.2023.2257252
*Ulum, Ö. G. (2024). Unveiling the layers: Analyzing ChatGPT implementations in Turkish State Universities. Base for Electronic Educational Sciences, 5(1), 114–134. https://doi.org/10.29329/bedu.2024.651.7
*Üstünbaş, Ü. (2024). EFL learners’ views about the use of artificial intelligence in giving corrective feedback on writing: A case study. In Z. Ç. Köroğlu & A. Çakır (Eds.), Fostering Foreign Language Teaching and Learning Environments with Contemporary Technologies (pp. 115–133). IGI Global. https://doi.org/10.4018/979-8-3693-0353-5.ch006
Vargas-Murillo, A. R., de la Asuncion, I. N. M., & de Jesús Guevara-Soto, F. (2023). Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of ChatGPT usage in higher education. International Journal of Learning, Teaching and Educational Research, 22(7), 122–135. https://doi.org/10.26803/ijlter.22.7.7
*Vo, T. K. A., & Nguyen, H. (2024). Generative artificial intelligence and ChatGPT in language learning: EFL students’ perceptions of technology acceptance. Journal of University Teaching and Learning Practice, 21, 6. https://doi.org/10.53761/fr1rkj58
*Wang, C. (2023). A syntactic complexity analysis of revised composition through artificial intelligence-based question-answering systems. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (pp. 1–3). IEEE. https://doi.org/10.1109/AICIT59054.2023.10277827
*Wang, Z., Mao, S., Wu, W., Xia, Y., Deng, Y., & Tien, J. (2023). Assessing phrase break of ESL speech with pre-trained language models and large language models. In Proceedings of the Interspeech 2023 (pp. 4194–4198). International Speech Communication Association https://doi.org/10.21437/Interspeech.2023-910
*Xiao, Y., & Zhi, Y. (2023). An exploratory study of EFL learners’ use of ChatGPT for language learning tasks: Experience and perceptions. Languages, 8(3), 212. https://doi.org/10.3390/languages8030212
*Xu, X., & Thien, L. M. (2024). Unleashing the power of perceived enjoyment: Exploring Chinese undergraduate EFL learners’ intention to use ChatGPT for English learning. Journal of Applied Research in Higher Education. https://doi.org/10.1108/JARHE-12-2023-0555
*Yan, D. (2024). Feedback seeking abilities of L2 writers using ChatGPT: A mixed method multiple case study. Kybernetes. https://doi.org/10.1108/K-09-2023-1933
*Yancey, K. P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short L2 essays on the CEFR scale with GPT-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 576–584). Toronto, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bea-1.49
*Yeh, H. C. (2024). The synergy of generative AI and inquiry-based learning: Transforming the landscape of English teaching and learning. Interactive Learning Environments. https://doi.org/10.1080/10494820.2024.2335491
*Young, J. C., & Shishido, M. (2023). Investigating OpenAI’s ChatGPT potentials in generating chatbot’s dialogue for English as a foreign language learning. International Journal of Advanced Computer Science and Applications, 14(6), 65–72. https://doi.org/10.14569/IJACSA.2023.0140607
*Yuan, Y., Li, H., & Sawaengdist, A. (2024). The impact of ChatGPT on learners in English academic writing: Opportunities and challenges in education. Language Learning in Higher Education, 14(1), 41–56. https://doi.org/10.1515/cercles-2023-0006
Zhang, P., & Tur, G. (2023). A systematic review of ChatGPT use in K–12 education. European Journal of Education. https://doi.org/10.1111/ejed.12599
*Zheng, Y. D., & Stewart, N. (2024). Improving EFL students’ cultural awareness: Reframing moral dilemmatic stories with ChatGPT. Computers and Education: Artificial Intelligence, 6, 100223. https://doi.org/10.1016/j.caeai.2024.100223
*Zindela, N. (2023). Comparing measures of syntactic and lexical complexity in artificial intelligence and L2 human-generated argumentative essays. International Journal of Education and Development Using Information and Communication Technology, 19(3), 50–68.
Acknowledgements
Nil.
Funding
This work was supported by The Education University of Hong Kong (Project No. #04A45, #CB382, and #CB383) and by Department of Mathematics and Information Technology (Departmental Research Grant; MIT/DRG02/24-25), The Education University of Hong Kong.
Author information
Authors and Affiliations
Contributions
Conceptualisation and design of the work: CKL; the acquisition, analysis, interpretation of data, and creation of resources used in the work: CKL, PLHY, SX; drafting the work and substantively revised it: CKL, PLHY, SX, DTKN, MSYJ. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lo, C.K., Yu, P.L.H., Xu, S. et al. Exploring the application of ChatGPT in ESL/EFL education and related research issues: a systematic review of empirical studies. Smart Learn. Environ. 11, 50 (2024). https://doi.org/10.1186/s40561-024-00342-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40561-024-00342-5






