Table 1 Statistics of the dialogue corpus

From: Non-intrusive assessment of learners’ prior knowledge in dialogue-based intelligent tutoring systems

Category Features
Time-on-task total_time: the time length of the dialogue in minutes
features avg_time_per_turn: the average length of a student turn in minutes
Generation dialogue_size: total length of the student dialogue (#words, excl. punctuation)
features avg_dialogue_size_per_turn (#words, no punctuation)
  dialogue_length_div_voc: dialogue_size divided by student’s vocabulary size
  #chunks: total number of syntactic constituents or chunks
  #sentences: total number of sentences
  content_vocSize: the vocabulary size of content words
  non_content_vocSize: the vocabulary size of non-content words
  vocSize: total vocabulary size
  %physicsTerms: percentage of physics related terms out of all the words used
  %longWords: percentage of long words out of those used
  %puctuation:percentage of punctuation out of all tokens used
  %articles: percentage of articles such as an or the out of all the words used
  %pronouns: # of non-self-reference pronouns (you, they) out of all words
  %self-references: # of self-reference pronouns (me or we) out of all words
  totalIC: total Information Content of the dialogue
  positiveness: text positiveness computed based on SentiWordNet
  negativeness: text negativeness
Scaffolding #turns: total number of student’s turns
features #normalized total number of student turns
  #c_turns: number of student turns classified as contributions (no questions)
  %pos_fb: percentage of turns for which student received positive feedback
  %neg_fb: percentage of turns for which student received negative feedback
  pos_div_pos+neg: positive feedback divided by (positive+negative) feedback
  #shownHints: total number of shown hints
  #shownPrompts: total number of shown prompts, a type of hints
  #shownPumps: total number of shown pumps, a type of hints