November 25, 2007

Effectiveness of Paraeducator-Supplemented Individual Instruction: Beyond Basic Decoding Skills

By Vadasy, Patricia F Sanders, Elizabeth A; Tudor, Sarah

Abstract A total of 46 children in Grades 2 and 3 with low word- level skills were randomly assigned to 1 of 2 groups that received supplemental phonics-based reading instruction. One group received intervention October through March (21.5 hours), and one group served as a control from October through March and later received intervention March through May (17.5 hours). Paraeducators trained in a standard treatment protocol provided individual instruction for 30 min per day, 4 days per week. At the March posttest, the early treatment (ET; n = 23) group outperformed the controls (late treatment, LT; n = 20) on reading accuracy and passage fluency. Across both groups, second graders outperformed third graders on these same measures. At the 3-month follow-up, the ET group showed no evidence of decline in reading accuracy, passage fluency, or words spelled; however, 3rd-grade ET students had significantly higher spelling skills compared to 2nd graders. The LT group demonstrated significant growth during their intervention in reading accuracy and spelling, but not passage fluency. When we compared the ET and LT groups on their gains per instructional hour, we found that the ET group made significantly greater gains than the LT group across all 3 measures. The results support the value of paraeducator- supplemented reading instruction for students below grade level in word identification and reading fluency.

Far too many students in our public schools are not developing the reading skills needed to negotiate a world with increasing literacy demands. In 2004, only 36% of U.S. fourth graders performed at or above proficiency level in reading, with disproportionate performance by race: The rate for White students was 41%, and for Black students was 13% (Perie, Grigg, & Donahue, 2005). One early obstacle to becoming a proficient reader is learning to identify words easily and rapidly-a complex task that is even more difficult for students with processing deficits and students who receive poor instruction or limited reading experience. Even those students fortunate enough to receive rigorous, well-implemented, researchbased reading instruction may need added explicit instruction in the complexity of the code (Berninger & Traweek, 1991) to establish a sound foundation in word-level skills necessary to become skilled readers.

The process of learning to read words involves acquiring successful alphabetic and decoding skills (Blachman, Ball, Black, & Tangel, 1994; Bradley & Bryant, 1983; Byrne & Fielding-Barnsley, 1991; Ehri, 1998; Fielding-Barnsley, 1997; Foorman, Novy, Francis, & Liberman, 1991). Explicit and systematic phonics approaches to teaching these skills have been well validated (Foorman, Francis, Fletcher, Schatschneider, & Mehta, 1998; Torgesen, Wagner & Rashotte, 1997; Vellutino et al., 1996). Regrettably, not all teachers as yet possess the technical knowledge that is believed to be needed to teach phonological awareness and phonics effectively (Cunningham, Perry, Stanovich, & Stanovich, 2004; Spear-Swerling, Brucker, & Alfano, 2005). Furthermore, inequities in access to well- trained teachers and science-based reading instruction mean that many children do not establish a foundation of strong word reading skills in the early grades (Lee & Burkam, 2002). Inconsistencies in early reading instruction also mean that in public schools with high rates of student movement, second- and third-grade teachers often receive students who lack essential decoding skills because these skills were not previously well taught. A frequent challenge that schools face is how to supplement core reading instruction for these students and help them catch up-an objective supported by a compensatory model of reading development (Parrila, Aunola, Leskinen, Nurmi, & Kirby, 2005).

Considerable research has demonstrated the value of teacher- implemented systematic and explicit instruction in early reading skills for at-risk students (Elbaum, Vaughn, Hughes, & Moody, 2000). However, teacher resources are limited in many public schools serving increased numbers of students from limited English- speaking, minority, and impoverished backgrounds that increase the risk for reading difficulties. These schools sometimes choose to supplement reading instruction for students with below-grade-level skills by using paraeducators more strategically. A growing body of research now supports the efficacy of kindergarten reading interventions implemented by instructional assistants (Blachman et al., 1994; Gunn, Biglan, Smolkowski, & Ary, 2000; Gunn, Smolkowski, Biglan, & Black, 2002; Gunn, Smolkowski, Biglan, Black, & Blair, 2005; Simmons, Kame'enui, Stoolmiller, Coyne, & Harn, 2003; Torgesen et al., 1999). We have previously reported on the effects of supplemental paraeducator instruction in alphabetic and phonics skills for high-risk kindergarten and first-grade students (Vadasy, Jenkins, & Pool, 2000; Vadasy, Sanders, Jenkins, & Peyton, 2002; Vadasy, Sanders, & Peyton, 2005; Vadasy, Sanders, & Peyton, 2006a). Critical features of such effective non-teacher-implemented supplemental instruction include training and research-based standard treatment protocols that paraeducators can learn to use with fidelity.

However, a number of questions about paraeducator-supplemented reading instruction remain to be fully addressed. First, how early must this instruction be provided to be effective? Ideally, students establish proficient decoding skills in kindergarten and first grade (Snow, Burns, & Griffin, 1998). To what extent can paraeducators effectively supplement instruction beyond basic decoding skills for second and third graders, who must master the complexity of the alphabetic system? After first grade, instruction in the code is less constrained, and there is a larger and more challenging set of letter-sound relations and increasing levels of word complexity to teach and to learn. Finally, do students maintain enough gains after instruction ends to warrant allocating scarce school resources for such supplemental instruction?

Our goal in this study was to address these questions. The intervention was designed in response to schools that had experienced the benefits of paraeducator tutoring in phonics skills for younger kindergarten and first graders. These schools wanted to continue to use paraeducators to work with older students who either had not received adequately explicit phonics instruction by the end of Grade 1 or needed continued targeted instruction in word-level skills to raise or maintain them at grade level.

Effects of Supplemental Intervention for Older Students

Although there is strong research support for the benefits of explicit and supplemental early reading intervention in kindergarten and Grade 1, a growing body of research also supports the benefits of similar interventions for older students. Most of these interventions have been implemented by trained teachers or researchers. Abbott and Berninger (1999) trained graduate students and school psychologists to provide instruction in structural analysis and alphabetic skills to students, Grades 4 to 7, with belowgrade-level word reading skills. Students receiving 16 hours of individual instruction made reliable growth in word identification, decoding, and spelling. Berninger et al. (2003) compared approaches to supplemental reading instruction for second graders with poor reading skills. Graduate students provided 8 hours of instruction to pairs of students in word recognition and comprehension skills. Students in this treatment group significantly outperformed controls in phonological decoding skills (d = 1.74). McGuinness, McGuinness, and McGuinness (1996) tested an experimenter-designed and - implemented intervention for students ages 6 to 16 years with reading difficulties. Students received an average of 9 hours of individual instruction in alphabetic, phonemic, and codeoriented skills, tailored to their level of reading development. Instruction included reading and spelling multisyllable words. The authors reported average standard gains of 2.57 points per hour for word attack, 1.70 points per hour for word identification, and mean posttest scores for word attack and word identification were at grade level. Working with students in Grades 6 to 10 with below- average word reading skills, Bhattacharya and Ehri (2004) provided 3 hours of individual instruction in a syllable segmentation strategy with practice in reading multisyllable words. Students trained by experimenters in this graphosyllabic analysis significantly outperformed students in a whole-word comparison group in decoding nonwords and real words and in syllable analysis skills, supporting the benefits of decoding instruction for these older students. In a treatment comparison study of students with learning disabilities ages 8 to 10 years, Torgesen et al. (2001) evaluated the benefits of explicit phonics instruction in spelling and syllable patterns and inflections. The teacher-implemented intervention produced large gains in reading growth (effect size for slope of 3.9 on standardized cluster score for word identification and passage comprehension). Rashotte, MacPhee, and Torgesen (2001) also demonstrated the benefits of teacher-implemented, smallgroup instruction in phonologically based reading intervention for students, Grades 1 through 6, with belowaverage decoding and word identification skills. In a treatment-comparison design in which students received an average of 35 hours instruction, effect sizes across grade levels were very strong for phonemic decoding (ranging from 1.67 to 2.20 across grade level groups), moderate for word- level reading, and moderate for reading rate. Furthermore, gains were maintained at a 2-month follow-up. In several studies, Lovett and her colleagues (Lovett et al., 1994; Lovett & Steinbach, 1997) compared the effectiveness of two remedial, small-group, teacher- implemented approaches to word identification training for students with severe reading disabilities in Grades 4 through 6. One intervention approach explicitly taught phonological analysis and blending skills, and the other intervention taught word identification strategies. The researchers found that both interventions were effective, with older students making gains comparable to those made by younger students. Wise, Ring, and Olson (1999) also compared several types of phonological awareness training for students with reading difficulties in Grades 2 through 5. Trained teachers provided 40 hours of instruction to small groups of three students. Students who received training made significant gains in phonological and reading skills, and maintained these gains at 1-year follow-up. Finally, in unpublished reports, Archer and her colleagues (Archer, Gleason, Vachon, & Hollenbeck, 2001; Vachon & Gleason, 2001) have described benefits of teacher-implemented, small- group instruction in flexible syllabication, blending, and vowel flexing strategies for low-skilled readers in Grades 4 through 6. Taken together, this group of studies has provided evidence that intensive and explicit interventions for core word reading deficits also benefit these older students. Effectiveness of Paraeducators

Interventions that can be effectively provided by teachers are not necessarily as effective when used by paraeducators with less training and experience. Furthermore, it might be expected that paraeducators can more easily implement a standard treatment protocol with kindergarten and firstgrade students, whose skill levels are less varied than those of older students with reading deficits. Yet effective word reading interventions for older students have been implemented by nonteacher tutors. Brown, Morris, and Fields (2005) recently documented the effectiveness of paraeducator tutoring in an adapted version of the Howard Street (Morris, Shaw, & Perney, 1990) tutoring model. Paraeducators provided individual instruction for second and third graders, including word study activities, for 45 min, twice a week, for an average of 40 hours of intervention. Tutored students significantly outperformed controls on measures of word reading, passage reading, and comprehension. Earlier, we (Vadasy et al., 2006b) reported on two cohorts of second and third graders who received supplemental reading instruction from paraeducator tutors. In a quasi- experimental field test of individual instruction in structural analysis (multiletter spelling units, common inflections and affixes, reading and spelling multisyllable words), averaging 42 hours, tutored students significantly outperformed controls on measures of decoding efficiency (d = .85), word identification (d = .80), passage comprehension (d = .75), passage reading rate (d = .82), and spelling (d = 1.0). In a subsequent, randomized design, a revised intervention implemented by paraeducators (averaging 36 hours) yielded significant group differences in word attack (d = 1.31), passage reading rate (d = 1.09), and passage reading accuracy (d = .97). In part, the present study, using shorter interventions, extends our investigation into the intensity or the number of hours of instruction required to remediate word-level and fluency skills in older students.

Explicit Instruction in Advanced Decoding Skills

In past intervention research, we have found that paraeducators can be trained to effectively deliver an explicit and systematic early reading intervention for kindergarten and first-grade students, all at the earliest stage of reading acquisition (Vadasy et al., 2000; Vadasy et al., 2002; Vadasy, Sanders, & Peyton, 2005, 2006b), and, as noted earlier, others have also used paraeducators in effective early interventions. Although the aforementioned studies have provided evidence on the efficacy of teacher- or experimenter-implemented interventions that feature flexible strategies for decoding multisyllable words (Archer et al., 2001; Bhattacharya & Ehri, 2004; Henry, 1989, 1993), it seems that caution is advised in generalizing these effects to paraeducators. Working with older students presents certain instructional challenges to nonteachers. First, the starting point for supplementing reading instruction for younger students can be more narrowly focused on alphabetic and decoding skills, whereas older students present a wider range of skill levels. By second grade, some students may still lack basic phonological decoding skills, whereas other students may have difficulty decoding multisyllable words with variant vowel patterns or affixes. Furthermore, phonics instruction for older students requires a deeper knowledge and confidence in working with the inconsistencies in English orthography, whereas teaching the alphabetic principle and a basic decoding strategy for K-I students is more straightforward. We were interested in whether paraeducators would be able to effectively supplement reading instruction for students with varying levels of sight word recognition (Ehri, 1995).



Students. Twenty-six second- and third-grade classroom teachers from nine urban, public elementary schools in the Northwest were asked to refer students whom they considered to have below-grade- level word reading skills (including poor reading fluency and difficulty with reading multisyllable words) for study participation. Referred students who (a) had no record of retention, (b) had not received previous phonics-based supplemental instruction (a concern because several schools had also participated in our previous research), and (c) had active parent consent, were screened using the Word Identification subtest from the Woodcock Reading Mastery Test-Revised/Normative Update (WRMT-R/NU; Woodcock, 1987, 1998). Students whose performance on the subtest ranged between the 10th and 37th percentiles (standard score between 81 and 95) were considered eligible for participation and were then further assessed on the full pretest battery. A floor at the 10th percentile was established to help ensure that students had at least some basic decoding ability (as this study considers the efficacy of instruction beyond basic decoding skills), whereas a ceiling at the 37th percentile was employed to exclude students who were likely not at risk for reading problems.

The 46 students meeting study eligibility criteria were randomly assigned within schools to one of two groups: early treatment or late treatment. Whereas the early treatment (ET) group received approximately 15 weeks of supplemental instruction from October to March, the late treatment (LT) group served primarily as a no- treatment comparison group during the first phase of the study, receiving approximately 12 weeks of supplemental instruction from March to May. Attrition during the first phase of the study (ET vs. LT) included 3 (7%) students who moved from their schools (2 ET and 1 LT). Attrition during the second phase of the study included 4 additional students (9%)-1 ET student who moved and 3 LT students who did not continue participation in the study due to scheduling conflicts. Final sample sizes were ET, n = 23, and LT, n = 20, for the first phase of the study (treatment vs. no treatment), and ET, n = 22, and LT, n = 17, for the second phase of the study (during which we examined follow-up treatment maintenance for ET and pretest- posttest changes for LT). Chi-square tests of independence (see Table 1) of demographic and status variables, including grade, gender, minority status, limited English proficiency, special education, and Title I status, revealed no reliable differences between groups.

Paraeducators. Once random assignment was complete, students were assigned to a paraeducator tutor based on coordination of classroom and tutoring schedules. Eleven paraeducators recruited from their respective school communities were hired as district employees paid by their respective schools with funds budgeted in the research grant. All but 2 of the 11 paraeducators served as tutors for both phases of the study (at one site, 1 paraeducator served only for the ET group, and 1 served only for the LT group), and, typically, 1 paraeducator served all students at a given site (one site had 2 paraeducators available). More than 60% of the paraeducators (n = 7) were new hires at their schools; the remainder were already employed at the schools in a tutoring capacity. Nevertheless, all but 1 paraeducator had had some previous experience in early reading instruction (range = 0-11 years; M = 4, SD = 4.1), and all but 2 had experience working with students in Grades 2 and 3 (range = 0-9 years; M = 3, SD = 3.5). Paraeducators' education levels ranged from 12 to 17 years and averaged 15 years (SD = 2; minimum requirements for school district employment included a high school diploma or equivalent). Finally, all paraeducators were women and most were nonminority (n = 2 self-identified as having minority backgrounds served in both phases of the study).


Study Design. To provide tutoring for all eligible students while simultaneously retaining a viable treatment comparison group, we employed a randomized treatment-control design for Phase 1 of the study, and a treatment-only repeated-measures design for Phase 2 of the study (similar to a quasi-experimental, multiplebaseline across- subjects design). Specifically, in Phase 1 of the study, one group (early treatment; ET) received treatment, and one group (late treatment; LT) served as the no-treatment control group for the former; this second group (LT) also received intervention, but only after the first intervention ended (i.e., during Phase 2 of the study). During their respective intervention periods, both ET and LT students were scheduled to receive individual tutoring for 30 min per day, 4 days per week. The ET group received approximately 15 weeks and the LT group received approximately 12 weeks of tutoring. Classroom teachers reported that during the pullout sessions, about half of the students in the ET group missed instruction in reading or writing (n = 11; 45%), and the other half missed recess, physical education, math, history, or some mixture of non-reading/writing instruction (n = 12; 55%). For the LT group, during the second phase of the study, the proportion of missed instruction was similar: About half missed reading or writing instruction (n = 9; 53%), and half missed non-reading/writing instruction (n = 8; 47%). Due to scheduling delays at the startup of the first phase of the study, the lengths of the interventions for each group were not equivalent; therefore, we compared gains per instructional hour in the results. At the end of their respective intervention periods, ET students completed a mean of 43 (SD = 6.15) sessions (October-March), with instruction averaging 21.5 hours. Correspondingly, LT students completed a mean of 35 (SD = 7.93) sessions (March-May), with instruction averaging 17.5 hours. An F test of the hours of instruction received revealed a significant difference between groups, F(1,38) = 14.34, p

Intervention. Paraeducators used a set of scripted lessons that called for 15 min of phonics instruction and 15 min of oral passage reading. During the first 5 weeks of intervention, the lessons provided review of lettersound correspondences for single letters and 2-letter spelling patterns. Paraeducators administered a placement test to each student, and students began this phonics review at the point where they scored less than 90% on the placement test. During the first 5 weeks, paraeducators were instructed to review basic decoding skills, target weak areas (e.g., letter sounds, blending skills), and move quickly through lesson content that the student had mastered. For example, many students required review or added instruction in decoding silent-e words and words with vowel teams. The lessons used in the first 5 weeks of instruction included phonics activities previously described in fuller detail (Vadasy et al., 2006b) and outlined more briefly hereafter. Instruction in all skills required paraeducator modeling of new skills, guided practice, and independent student practice with paraeducator scaffolding.

1. Letter-sound correspondences included instruction in single- letter and 2-letter sound correspondences. Students practiced discriminating a set of 12 to 16 letter sounds in each lesson.

2. Decoding involved paraeducators' modeling a continuous blending strategy and students' imitating and practicing blending sets of 6 to 15 words that were decodable according to the scope and sequence for introducing letter sounds.

3. Sight word reading involved paraeducators' introducing highfrequency sight words that students were called upon to read in their oral reading passages. Students also practiced discriminating sets of 4 to 15 sight words that had been previously introduced.

4. Spelling involved paraeducators' dictating 3 decodable words and 3 sight words that had been introduced for student spelling practice.

5. Additional phonics generalizations in later lessons featured brief but explicit instruction in a small group of high-frequency spelling-sound relationships, some which often presented obstacles to word reading. These include silent-e words, common inflections, common two-letter spelling patterns, nasalized consonant blends, affricated blends, alternate spellings for vowel sounds, and contractions.

Paraeducators administered a mastery test every 10 lessons to monitor student acquisition of skills and guide the pace of instruction. Paraeducators added review and moved more slowly through the lessons for students whose skills were not yet adequate to move forward.

During the second 5 weeks of intervention, instruction focused entirely on reading and spelling words with two-letter spelling patterns (digraphs, r-controlled vowels, consonant blends, vowel teams, diphthongs, and silent letters). The instructional routine remained similar: The paraeducator modeled new sounds, and the students imitated and practiced discriminating a set of 20 letter pairs. Then students practiced reading and spelling sets of 15 to 20 words that featured the taught spelling patterns. For 5 min in each session, the student practiced reading sight word phrases composed of highfrequency sight words to build fluency in reading these words in common contexts. Two mastery tests were used to monitor student progress through these lessons. Paraeducators added review if students did not demonstrate adequate mastery.

Across the first 10 weeks of intervention, the last 15 min of each session were devoted to oral reading practice. Students read short, 50- to 60-word passages written to feature a high proportion of words with taught spelling patterns. Later, as time and student reading level permitted, texts matching the student's reading level were added from the Quick Reads (Hiebert, 2003) fluency program. The Quick Reads program consists of short (80- to 120-word) passages on science and social science topics matched to gradelevel content area curriculum. Text characteristics are designed to support fluency. For example, the second-gradelevel texts are written to include primarily high-frequency or decodable words, with single-syllable and regular short- and long-vowel phonics patterns. Each book includes sets of five related passages to build depth of content knowledge on a small number of topics. Topics for second-grade passages included the five senses, the seasons, how things are measured, Americans who had a dream, places where people work, and toys of long ago. During this 15 min of oral reading practice, students read each passage three times: (a) independently with the paraeducator scaffolding and correction, (b) together with the paraeducator at a smooth and fluent pace, and (c) independently and as fluently as possible. Paraeducators were instructed to direct students to word features that had been taught in the phonics instruction, and word reading accuracy was the immediate objective of this reading practice. Once students were able to read the passage accurately, the paraeducator encouraged the student to read it more fluently.

If students demonstrated adequate mastery of the phonics skills covered in the first 10 weeks of instruction, the tutoring sessions during the final 5 weeks of intervention were devoted solely to oral reading practice to consolidate word reading skills. (This part of the intervention was truncated and averaged only 2 weeks for the LT group.) Paraeducators followed the prescribed Quick Reads repeated reading procedures. Students read each passage three times. On the first read, paraeducators activated background knowledge of the topic with a brief introduction or question, and the student read the passage aloud. On the second read, the student and paraeducator read the passage together, and the paraeducator modeled prosodie reading. On the third read, the paraeducator timed the student reading the passage for 1 min, and recorded the student's reading time. The student and paraeducator then read together and discussed two comprehension questions for each passage. During these last 5 weeks, paraeducators continued to encourage students' word reading accuracy during the passage reading. Paraeducators reminded students to look for word features that had been taught in isolation in the previous sessions and to correct their errors by recognizing these features (e.g., using knowledge of the au spelling pattern to decode the word author). Students were encouraged by paraeducators to refer to a letter-sound card to help them remember how to decode taught letter patterns.

Training. Researchers provided 3 hours of initial training to introduce and model instructional procedures and to supervise practice on each instructional component. Initial training also presented explicit error correction, scaffolding procedures, pacing, and use of specific praise. Ongoing coaching and follow-up training were provided throughout the year during biweekly visits to sites. Typically, each paraeducator received an added 60 to 90 min of individual on-site training.

Treatment Fidelity

Across 176 on-site observations over the course of both interventions (averaging 16 observations per paraeducator; only one site had a change in paraeducator from the first to the second phase of the study), one of five researchers rated paraeducators' fidelity to treatment protocols using a 5-point scale (0 = never does this activity; 4 = always does this activity) on 21 instructional criteria. For all criteria, researchers scored only those lesson parts observed during the visit. Mean ratings on 43 paired observations, using four pairs of raters (one researcher-rater was used as a baseline for comparison with the other four), were significantly correlated. When combined across rater pairs, the ratings were significantly correlated at r = .91, p .05.

Test Assessments

Students were assessed in September-October prior to the commencement of the first intervention, again in March at the conclusion of the first intervention (ET), and then again a third time in May-June after the second intervention (LT) concluded. Measures administered in March and May were specifically selected to assess the three central components of instruction: reading accuracy, fluency, and spelling.

Receptive Language. Receptive language was measured at pretest only with the Peabody Picture Vocabulary Test-IIIA (PPVT-IIIA; Dunn & Dunn, 1997) for the purpose of describing our sample. This test requires students to select a picture that best illustrates the meaning of an orally presented stimulus word. Testing is discontinued after the student misses 8 out of 12 items within a set. The raw score is adjusted for the student's age and standardized such that a mean of 100 represents the 50th percentile and a standard deviation of 15 corresponds to the 16th and 84th percentiles. Test-retest reliability is reported in the test manual as .93 for 6- to 10-year-olds. Internal consistency for this sample is .97 (139 items).

Classroom Behavior. Students' classroom behavior was measured in February using the Behavior scale from the Multigrade Inventory for Teachers (MIT; Shaywitz, 1987), again for the purpose of describing our sample. For this measure, teachers rated students' classroom performance on two items with scales ranging from 0 to 5 (higher scores indicating worse performance). Agronin, Holahan, Shaywitz, and Shaywitz (1992, pp. 98-99) reported an internal consistency of .91; internal consistency computed for our sample was .89 (2 items).

Reading Accuracy. Reading accuracy was measured at each assessment period using the average of the Word Attack and Word Identification subtests of the WRMT-R/NU (Woodcock, 1998; alternate form was used at the second testing period). The Word Attack subtest requires the student to read a list of pseudowords that increase in difficulty. The Word Identification subtest requires the student to read increasingly difficult real words. For each subtest, testing was discontinued after 6 consecutive items were missed. Split-half reliability reported for third graders in the test manual averages .96 for Word Attack and .99 for Word Identification. Internal consistencies computed for our sample's raw scores were .90 (34 items, Form H), .91 (43 items, Form G), and .93 (41 items, Form H) for Word Attack at the first, second, and third assessment periods, respectively. For Word Identification, internal consistencies computed for our sample's raw scores were .94 (58 items, Form H), .94 (61 items, Form G), and .96 (72 items, Form H) at the first, second, and third assessment periods, respectively. For both Word Attack and Word Identification, the raw score was adjusted for the student's age and standardized so that a mean of 100 corresponded to the 50th percentile, and +- 1 SD (15) corresponded to the 16th and 84th percentiles. For each student, we averaged Word Attack and Word Identification standard scores to create a composite score for reading accuracy in which both measures were equally weighted.

Passage Fluency. The number of words read correctly per minute was measured by selecting the students' median performance across three grade-level passages for each assessment period (alternate forms used at pretest, midtest, and posttest) from the Dynamic Indicators of Basic Early Literacy Skills (DIBELS; Good & Kaminski, 2002). Specifically, second graders read the following second-grade passages:

First test period: "Mom's New Job,""My Handprints," and "Meals on Wheels";

Second test period: "Riding the Rollercoaster,""Moving Day," and "Stars of the Sea";

Third test period: "If I Had a Robot,""My Grandpa Snores," and "My Drift Bottle."

Third graders read the following third-grade passages:

First test period: "My Friend,""Going to the Family Camp," and "Planting a Garden";

Second test period: "Field Trip,""Keiko the Killer Whale," and "Getting E-mail";

Third test period: "Pots,""Animal Tracks," and "My Parents."

Students read each passage aloud while the tester recorded errors; testing was discontinued after 1 minute. Words omitted, words substituted, and hesitations of more than 3 seconds were scored as errors (words self-corrected within 3 seconds were scored as accurate). The number of words read correctly for each passage was recorded. Next, we selected the median passage performance for each student, for each assessment period, for analysis. Median alternate-form reliability for the second-grade DIBELS oral reading fluency passages has been reported as .94 (Good, Kaminski, Smith, & Bratten, 2001). For our sample, internal consistencies for the nine second-grade passages ranged from .96 to .99; the nine third-grade passages ranged from .97 to .99.

Words Spelled. Spelling was assessed with the Wide Range Achievement Test-Revised (WRAT-R; Jastak & Wilkinson, 1984) Spelling subtest, Level 1 (same form used at all testing periods), which typically requires students to copy marks, write their names, and spell dictated words; testing is discontinued after 10 consecutive missed items. The raw score used to compute a standard score typically includes copied marks and name spelling in addition to the number of words correctly spelled; internal consistency reported in the test manual for 7- to 8-year-olds is .93. However, similar to Juel (1988), we computed a raw score based on the number of words correctly spelled. Testing was discontinued after 10 consecutive misspelled words instead of items; the test includes 45 possible words. Internal consistencies computed for our sample were .83 (22 items), .84 (23 items), and .89 (29 items) at the first, second, and third assessment periods, respectively.


SPSS 13.0 for Windows (SPSS, 1989-2004) was used for all data analyses. For analyses of variance, between-groups effect sizes were computed as the difference between group means, divided by the pooled estimate of the standard deviation (square root of the mean square error term; Cohen, 1988); for analyses of covariance, effect sizes were computed as the difference between adjusted group means, divided by the pooled estimate (Cohen, 1988). Partial eta-squared (percentage of variance in the scores accounted for by the effect) was used to describe the magnitude of interaction and within- subjects effects.


Phase 1

Pretest (October). Intercorrelations among the five pretests (receptive language, classroom behavior, reading accuracy, passage fluency, and words spelled) ranged from r = -.19 (receptive language and passage fluency, p > .05) to r = .73 (passage fluency and words spelled, p .05, unadjusted for multiple tests of significance). Differences between grades were detected, however, on our raw score measures of passage fluency (Grade 2, M = 20.4, SD = 8.67; Grade 3, M = 34.9, SD = 15.16), F(1, 39) = 14.85, p

Though not shown in Table 2, classroom behavior was not reliably different between groups (ET, M = 1.7, SD = 1.56; LT, M = 1.0, SD = 1.41), F(1, 38) = 2.22, p > .05; or grades (Grade 2, M = 1.5, SD = 1.67; Grade 3, M = 1.1, SD = 1.27), F(1, 38) = 1.14, p > .05; nor was there a significant interaction between group and grade, F(1, 38) = 0.07, p > .05. (Note that classroom teachers' ratings of their students were completed midyear, with higher scores indicating worse performance; furthermore, one ET student was missing data for this measure.) Moreover, classroom behavior was not correlated with any of the pretest measures (ranging from r = -.12 to r = .10, ps > .05).

Posttest (March). Correlations among the three March posttests were moderate, ranging from r = .41 (reading accuracy and passage fluency, p .05).

Although posttests were moderately correlated with one another, we analyzed the outcomes separately, rather than in a multivariate framework, because we chose the three outcome measures to reflect the training tasks and because we did not feel that a linear combination of the three outcomes made theoretical sense (as a multivariate analysis of variance tests the linear combination of the outcomes for significance). Furthermore, a multivariate analysis does not necessarily provide protection from Type I error inflation when conducting multiple follow-up analyses of variance (see Jaccard & Guilamo-Ramos, 2002; Stevens, 2002, pp. 181-182). We therefore conducted separate 2 (ET vs. LT) x 2 (Grade 2 vs. Grade 3) analyses of covariance (ANCOVAs) on each posttest, using the respective pretest as covariate, and controlled Type I error at .05 for each set of statistical tests using the Holm sequential step- down method (Holland & Copenhaver, 1988; Holm, 1979). We selected the Holm method because the Bonferroni procedure is overly conservative and the Hochberg procedure is inappropriate for correlated statistical tests (see Benjamini & Hochberg, 1995; Hochberg, 1988). The Holm method is a process in which each decision to reject or retain the null hypothesis beyond the first significant F test is based on whether the observed p value is smaller than the critical adjusted (and conditional) probability of a false rejection. For 3 statistical tests, the process is as follows: 1. For a given set of tests, sort F test p values from smallest to lowest.

2. Adjust the first p value by .05/k = .05/3 = .0167 (where k = 3, the total number of tests within the set of measures). If this first test's observed p value is smaller than the critical adjusted p value of .0167, declare significant and proceed to the second test; otherwise, this and all remaining tests are considered nonsignificant.

3. Adjust the second p value by .05/(k-1) = .05/2 = .0250. If this second test's observed p value is smaller than the critical adjusted (and conditional) p value of .0250, declare significant and proceed to next test; otherwise, this and all remaining tests are considered nonsignificant.

4. Adjust the third p value by .05/(k-2) = .05/1 = .0500. If this third test's observed p value is smaller than the critical adjusted (and conditional) p value of .0500, declare significant and stop; otherwise, the test is considered nonsignificant.

Table 2 reports the ANCOVA results, including the unadjusted p values as well as whether the effect is significant at the .05 alpha level after adjusting for 3 statistical tests. The results from our analyses revealed significant main effects for group and grade on reading accuracy and passage fluency posttest performance (adjusted ps

We then examined whether differences in the ET group's outcomes were dependent on whether the intervention was more supplemental (i.e., students who were typically pulled out of the classroom for tutoring during reading/writing instruction) or less supplemental (i.e., students who were typically pulled out of the classroom for tutoring during non-reading/writing instruction). 2 (ET vs. LT) x 2 (more vs. less supplemental) ANCOVAs of the three measures (with the respective pretest as covariate) showed that mean ET outcomes were not influenced by whether the students missed classroom reading/ writing instruction during tutoring (all Fs .05).

In summary, for both reading accuracy and passage fluency, receipt of treatment provided an advantage of more than three fourths of a standard deviation in performance beyond that of no- treatment students (d = .87 and .80, respectively). In parallel, the results also revealed large advantages for second graders over third graders (across both ET and LT groups) in posttest performance, with effect sizes of d = .82 and 1.09 for reading accuracy and passage fluency, respectively. The more pronounced grade effect of passage fluency (d = 1.09) was likely due to the nonstandardized nature of the passage fluency measure. Finally, the lack of interaction between group and grade revealed that the treatment effects were not dependent on grade level.

Phase 2

Early Treatment Follow-Up (May). Follow-up test correlations (Posttest 2) for the ET group ranged from r = .29 (reading accuracy and passage fluency, p > .05) to r = .47 (passage fluency and words spelled, p

To find out whether the ET group maintained their prior posttest performance level 3 months postintervention, students were assessed again at the end of May on reading accuracy, passage fluency, and words spelled. (During that time, one student moved, resulting in a follow-up sample size of n = 22 for each measure.) We conducted a series of 2 (Posttest 1 vs. Posttest 2; within-subjects) x 2 (Grade 2 vs. Grade 3; between-subjects) mixed ANOVAs, again using the Holm method to control Type I error at .05. Grade level was included as a between-subjects factor because main effects had been detected in the Phase 1 analyses. The results (see Table 3) showed no evidence that ET students grew or declined significantly different from zero in Posttest 1 (March) to Posttest 2 (May) performance (all adjusted ps > .05). Nevertheless, it should be noted that, on average, there appeared to be a trend for a 1.6- point decline in ET student performance on reading accuracy (with a performance at the 37th percentile rather than at the 42nd percentile 3 months earlier). Only for words spelled did we detect a significant grade effect, indicating that third graders outperformed second graders by approximately 3 wcpm across both posttests. Indeed, the ET group's follow-up results suggest that students maintained their previous posttest performance at follow-up.

Late Treatment Efficacy. The LT group began intervention after the first phase of the study had been completed (Le., after Posttest 1), and ended intervention prior to Posttest 2. Although we lacked a no-treatment comparison for this group, we examined student growth on each measure. Three students assigned to receive treatment during this period were unable to attend tutoring sessions due to scheduling conflicts, leaving a sample size of n = 17 for analysis.

Correlations among the three "pretest" measures (March; Posttest 1) for this group were moderate, ranging from r = .34 (reading accuracy and words spelled, p > .05) to r = .57 (passage reading and words spelled, p .05) to r = .52 (passage fluency and words spelled, p

Although we lacked a control for the LT group, we wished to know whether the LT group made any significant gains during their shorter intervention period. To answer this question, we conducted separate 2 (Posttest 1 vs. Posttest 2; within-subjects) x 2 (Grade 2 vs. Grade 3; between-subjects) mixed ANOVAs, again using the Holm method for controlling Type I error at .05 per set of tests. The results from these analyses (see Table 4) revealed that, on average, LT students grew approximately 2.5 standard score points on reading accuracy during their intervention period (adjusted p

What was markedly different from our findings from the first intervention was that LT students grew significantly on words spelled (an increase of about 1 wcpm; adjusted n .05).

Comparison of Gains per Instructional Hour

Although ET and LT intervention gains were not directly comparable due to the significant difference in duration and timing of intervention received by both groups, we computed and compared the gains per instructional hour received for each individual student (computed as: raw/standard score point gain or loss between their respective "pretest" and "posttest," divided by the number of intervention hours derived from their respective attendance record; see Table 5). ET students made more gains per intervention hour received than did the LT group across all measures (all adjusted ps


The present study extends our previous findings on the beneficial effects of supplemental instruction, as provided by paraeducator tutors, for low-skilled second and third graders (Vadasy et al., 2006b). In the current study, students with poor word-level skills in second and third grade were randomly assigned to one of two groups: early treatment (ET), who received tutoring from October to March, or late treatment (LT), who received tutoring from March to May. ET students, who received an average of 21.5 hours of individual instruction, performed significantly better than their as yet untutored peers (LT) at Posttest 1 on measures of reading accuracy and passage fluency; similarly, second graders significantly outperformed third graders in both of these areas. ET students continued to maintain their posttest performance levels at 3-month follow-up. LT students, who received an average of 17.5 hours of instruction, made significant pretest-posttest gains in reading accuracy and words spelled, but not in passage fluency. Furthermore, no grade effects or Posttest x Grade interactions were evident for LT students.

Across both ET and LT groups, students' rates of gain (points gained per instructional hour) were related to the intensity of the intervention across all three outcome measures: ET students, who received intervention across a longer time period than LT students, had reliably higher rates of gain in reading accuracy, passage fluency, and words spelled than LT students. Only for one measure, reading accuracy, did there appear to be a grade effect: Second graders had a significantly higher rate of gain (.33 standard score points per hour) compared to third graders (.19 standard score points per instructional hour). The absence of Group x Grade interactions suggests that students derived benefits from the intervention regardless of grade level.

Individual tutoring in alphabetic and phonics skills enhanced both word-level and fluency skills for struggling second- and third- grade readers. second-grade ET and LT students attained, on average, word-level skills at or near grade level at postintervention, although these skills declined slightly (albeit not significantly) for ET students at follow-up. Although there is no evidence that ET students declined or grew in passage fluency at 3-month follow-up, they remained quite far below grade level in reading rate at the end of the school year.

These findings on sustained word-level effects are similar to those reported by Rashotte et al. (2001) for a 2-month follow-up of a similar intervention. In the present study, there were no significant group differences for spelling, which was a secondary focus of treatment due to limited instructional time. Students likely require more targeted instruction and practice to correctly produce irregular and more complex spellings. As Rashotte et al. (2001) also observed, a spelling measure that credits phonologically acceptable spelling attempts (e.g., wether for weather, stock for stalk, approov for approve, sower for sour) may better capture transfer effects to spelling.

Although students' reading rates improved, their fluency remained seriously impaired. Reading rate was a secondary intervention target for those students who attained accuracy rates high enough to support fluency building practice. Students in both groups remained between the 10th and 25th percentile in fluency at their respective posttests. Others have documented the difficulty of remediating fluency deficits (Ehri & Wilce, 1983; Rashotte, Torgesen, & Wagner, 1997; Rashotte et al., 2001; Torgesen et al., 2001).

Student outcomes and our observations confirmed that paraeducators were able to effectively supplement instruction in second and third graders' word reading skills. Compared to the consistently large effect sizes in our earlier reports on similar instruction by paraeducators for younger kindergarten and first- grade students at risk for reading difficulties (Vadasy et al., 2000; Vadasy et al., 2002; Vadasy et al., 2005, 2006a), effect sizes in this intervention were moderate to large. Effect sizes in this study were also smaller than those reported in longer interventions for older students that were implemented by teachers or researchers (i.e., Berninger et al., 2003; Rashotte et al., 2001; Torgesen et al., 2001). However, schools like these research sites that serve large numbers of students from minority and low socioeconomic backgrounds often identify older students who have not yet established a foundation of strong alphabetic and decoding skills. These schools often serve many students not meeting state reading standards. Teachers often have limited time after first grade to teach phonologically based reading skills. These schools may consider strategically using paraeducators to administer a relatively brief, intensive, and explicit phonics intervention to help students master the mechanics of word reading as they are concurrently learning in their classroom reading instruction the more difficult task of reading to learn (Chall, 1995). As we have cautioned in earlier reports (Vadasy et al., 2000; Vadasy et al., 2002), and as others have also advised (Allor & McCathren, 2004; Wasik, 1998), training is a prerequisite for effective paraeducator- or tutor-implemented reading interventions. Initial training that described, modeled, and allowed for practice of the instructional formats, together with ongoing individual coaching, enabled the paraeducators in this study to attain a high level of implementation.

We believe these findings warrant considering how paraeducators are used to assist in reading instruction. For one thing, the number of paraeducators employed in U.S. schools has increased about 60% since 1990. In 2002, there were 664,385 paraeducators in the country, with 72,000 in California, 59,000 in Texas, and 42,000 in New York (National Center for Education Statistics, 2004). One longer term approach to using these staff members more effectively in schools with student needs that exceed the available teacher resources is to improve training for paraeducators and to provide opportunities for joint planning time with teachers and appropriate teacher supervision (French, 2003). This goal requires changes in paraeducator training requirements, which are being affected by No Child Left Behind regulations, as well as changes in teacher preparation programs to include formal training in paraeducator supervision. Less attention has been given to identifying instructional practices and programs with modest training requirements that paraeducators can use more immediately and effectively. Paraeducator positions are filled by individuals with varied career ambitions. For example, some paraeducators certainly are candidates for a career ladder program of training leading to teacher certification. We have learned over the past decade that most of the paraeducators we have worked with in our research sites prefer and are drawn to working with individual or small groups of students on prescribed reading skills that they can more easily master and proficiently teach. We believe that schools have not recognized the diversity and the potential of this group of individuals to fill very prescribed roles in effectively supplementing instruction in critical early reading skills.


Our findings should be interpreted in light of several limitations. First, our sample was drawn from a restricted range of children from the 10th to 37th percentiles; thus, the findings may only generalize to children in this performance range. Second, the groups received unequal interventions: The LT group began intervention with higher word-level scores than the ET group, and the direct comparison of ET and LT groups through their gains per instructional hour must be interpreted with caution. Third, the small grade-level groups in this study may have limited our ability to detect stronger grade-treatment interactions, which were limited to a trend we described for second graders to benefit more than third graders in reading accuracy. Fourth, conclusions about the maintenance of gains are tentative due to the relatively short follow-up period. Implications for Research and Practice

The consistent findings in these two groups on reading accuracy effects and gains and the findings on fluency effects for the ET group raise the question of whether a longer intervention with extended oral reading practice might have moved fluency rates closer to grade-level benchmarks. In our earlier report on a similar intervention for second and third graders (Vadasy et al., 2006b), we reported an effect size for fluency of d = .82 for a 42-hour intervention, and d = 1.09 for a 36-hour intervention. Our design does not allow us to know how much intervention students with poor reading skills require to attain fluency benchmarks, but it is safe to say that fluency gains are not easily won for these poor readers. In both studies, students have considerably more distance to cover to reach fluency benchmarks, yet across the 42-hour, 36-hour, and even 21-hour interventions we reported, fluency effects were not trivial. Furthermore, we observed students and paraeducators to be very engaged and to enjoy the oral reading part of the tutoring sessions. Both students and tutors looked forward to this portion of the tutoring sessions, and tutors were observed to effectively apply scaffolding and correction procedures that offered students incidental word-level instruction. Oral reading practice combined with targeted instruction in phonics skills appears to be a powerful fluency treatment protocol for poor readers. Optimal timing for this type of intervention may be informed by the higher rates of fluency and word-level gains made by the second graders in the ET group. Finally, if more intervention time had been available, we believe that added spelling and writing activities would have helped students to consolidate their orthographic skills.

The gains described in Table 5 suggest several implications for research and practice. First, the greater gains per hour made by the ET group suggest that the timing of instruction (i.e., earlier in the academic year) may influence the response to reading intervention. Across our decade of intervention research, we have found the focus and press on instruction to be increasingly difficult to maintain after spring break of the academic year. In this study, the timing of instruction was confounded with the length of intervention, which prevented us from testing this timing hypothesis for the larger gains per hour for the ET group. Second, the ET gains also raise the question of the contribution of distributed practice on student outcomes. Although the absolute difference between the ET and LT groups in terms of hours of instruction was about 4 hours, ET intervention actually extended over a 20-week period, whereas LT intervention extended over a 12- week period. Both interventions were interrupted by typical school breaks, yet the ET students had more extended time to consolidate skills, both during the intervention sessions and in classroom reading instruction. Third, whereas there may be a critical threshold in terms of the number of hours of instruction required for optimal gains, our design did not permit us to estimate this, as intervention intensity was confounded with timing. For the 21-hour ET intervention, however, gains in standard score points per hour of instruction for reading accuracy were comparable to phonemic decoding and word identification gains previously reported by others for longer interventions that were implemented by teachers or experimenters (Rashotte et al., 2001: .50 and .19 points per hour, respectively, for a 30-hour small-group intervention; Torgesen et al., 2001: .36 and .21 points per hour, respectively, for a 67-hour one-on-one intervention; Wise et al., 1999: .31 and .22 points per hour, respectively, for a 40-hour small-group intervention). Finally, it should be noted that, as grade level significantly influenced rates of gain for reading accuracy only, it may be that this type of standard protocol implemented by nonteachers has a "ceiling effect." In this study, the intervention appeared more effective for second graders, who are more typically beginning to acquire the targeted word-level skills (and are also having these skills reinforced more often in their classroom instruction). Third graders may be less likely to have their phonics skills reinforced in classroom instruction, and they may also require more skilled and differentiated instruction than paraeducator tutors were able to provide with a standard treatment protocol. In this study, supplemental literacy instruction delivered by paraeducators boosted students in the ET group to nearly grade-level reading accuracy. These findings suggest that extended supported reading practice with paraeducators might have moved students closer to grade-level fluency. The study describes one type of carefully defined role in which paraeducator staff can effectively contribute to school-level efforts to help all students meet basic reading standards.


Abbott, S. P., & Berninger, V. W. (1999). It's never too late to remediate: Teaching word recognition to students with reading disabilities in Grades 4-7. Annals of Dyslexia, 49, 223-250.

Agronin, M. E., Holahan, J. M., Shaywitz, B. A., & Shaywitz, S. E. (1992). The Multi-Grade Inventory for Teachers (MIT): Scale development, reliability, and validity of an instrument to assess children with attention deficits and learning disabilities. In S. E. Shaywitz & B. A. Shaywitz (Eds.), Attention deficit disorder comes of age: Toward the twenty-first century (pp. 98-116). Austin, TX: PRO-ED.

Allor, J., & McCathren, R. (2004). The efficacy of an early literacy tutoring program implemented by college students. Learning Disabilities Research & Practice, 19, 116-129.

Archer, A. L., Gleason, M. M., Vachon, V., & Hollenbeck, K. (2001). Instructional strategies for teaching struggling fourth and fifth grade students to read long words. Unpublished manuscript.

Benjamini, Y.,