Intelligent Tutoring Systems derive much of their power from having a student model that describes the learner's competencies. However, constructing a student model is challenging for computer tutors that use automated speech recognition (ASR) as input, due to inherent inaccuracies in ASR. We describe two extremely simplified models of developing word decoding skills and explore whether there is sufficient information in ASR output to determine which model fits student performance better, and under what circumstances one model is preferable to another. The two models that we describe are a lexical model that assumes students learn words as whole-unit chunks, and a grapheme-to-phoneme (G-to-P) model that assumes students learn the individual letter-to-sound mappings that compose the words. We use the data collected by the ASR to show that the G-to-P model better describes student performance than the lexical model. We then determine which model performs better under what conditions. On one hand, the G-to-P model better correlates with student performance data when the student is older or when the word is more difficult to read or spell. On the other hand, the lexical model better correlates with student performance data when the student has seen the word more times.