Think-aloud protocols (TAPs) are frequently used in research on essay rating processes. However, there are very few empirical studies of the completeness of TAP data and the effects of this technique on rater performance (i.e., rating processes and outcomes). This study aims to start to address this research gap. As part of a larger study on rater decision-making behaviors, 11 novice and 14 experienced raters rated, both analytically and holistically, a sample of ESL essays silently and while thinking aloud. The raters were then interviewed about their perceptions of thinking aloud and its effects. Essay scores were submitted to FACETS analyses, while TAP and interview data were analyzed qualitatively. Score and qualitative data analyses provided evidence and explanations concerning the veridicality and reactivity of TAPs across rater groups (novice vs. experienced) and rating scales (holistic vs. analytic). The paper concludes with several theoretical and methodological implications and questions for future studies using TAPs to build models of and compare essay rating processes across individuals, groups and contexts.