Skip to main content

Table 1 Statistics of the dialogue corpus

From: Non-intrusive assessment of learners’ prior knowledge in dialogue-based intelligent tutoring systems




total_time: the time length of the dialogue in minutes


avg_time_per_turn: the average length of a student turn in minutes


dialogue_size: total length of the student dialogue (#words, excl. punctuation)


avg_dialogue_size_per_turn (#words, no punctuation)


dialogue_length_div_voc: dialogue_size divided by student’s vocabulary size


#chunks: total number of syntactic constituents or chunks


#sentences: total number of sentences


content_vocSize: the vocabulary size of content words


non_content_vocSize: the vocabulary size of non-content words


vocSize: total vocabulary size


%physicsTerms: percentage of physics related terms out of all the words used


%longWords: percentage of long words out of those used


%puctuation:percentage of punctuation out of all tokens used


%articles: percentage of articles such as an or the out of all the words used


%pronouns: # of non-self-reference pronouns (you, they) out of all words


%self-references: # of self-reference pronouns (me or we) out of all words


totalIC: total Information Content of the dialogue




positiveness: text positiveness computed based on SentiWordNet


negativeness: text negativeness


#turns: total number of student’s turns


#normalized total number of student turns


#c_turns: number of student turns classified as contributions (no questions)


%pos_fb: percentage of turns for which student received positive feedback


%neg_fb: percentage of turns for which student received negative feedback


pos_div_pos+neg: positive feedback divided by (positive+negative) feedback


#shownHints: total number of shown hints


#shownPrompts: total number of shown prompts, a type of hints


#shownPumps: total number of shown pumps, a type of hints