September 4, 2005
Effects of Explicit Reading Strategies Instruction and Peer Tutoring on Second and Fifth Graders’ Reading Comprehension and Self- Efficacy Perceptions
The authors evaluated the effectiveness of explicit reading comprehension strategies instruction, followed by practice in teacher-led whole-class activities (STRAT), reciprocal same-age (STRAT + SA) peer-tutoring activities, or cross-age peer-tutoring activities (STRAT + CA) on 2nd and 5th graders' reading comprehension and self-efficacy perceptions. For 2nd graders, multilevel analyses revealed significant STRAT and STRAT + CA effects; however, the effects did not last after finishing the program. Fifth graders in all 3 experimental conditions performed significantly better on the posttest than their control group peers. Results also showed continued growth for the STRAT and STRAT + CA conditions until at least 6 months after students finished the program. Moreover, on both the posttest and retention test, 5th graders in the STRAT + CA condition reported significantly fewer negative thoughts related to their reading proficiency.Key words: elementary education, multilevel modeling, peer tutoring, reading comprehension, reading strategies, self-efficacy
IN READING RESEARCH, decoding instruction has had a long and continuous history of attention and debate. However, a hiatus can be recorded in the study of reading comprehension. Two decades ago, strategy intervention research was in vogue, but only recently has comprehension instruction received renewed attention, with current studies building on what was accomplished in the 1980s. Now, the challenge in reading comprehension research is to increase the efficacy of instruction in elementary schools by identifying the instructional practices and activities that best serve to develop children's self-monitoring for comprehension (Snow, Burns, & Griffin, 1998).
Previously, reading comprehension was considered to be a process of mastery: Once students could decode, comprehension was assumed to occur automatically (Dole, 2000). Research, however, has shown that good readers are characterized by more than just decoding skills. Cognitively based views of reading comprehension emphasize that proficient readers use a flexible repertoire of comprehension monitoring and regulating activities (Dole, Duffy, Roehler, & Pearson, 1991), which includes both cognitive and metacognitive strategies (Baker & Brown, 1984; Paris, Wasik, & Turner, 1991; Pressley & Allington, 1999; Pressley, Johnson, Symons, McGoldrick, & Kurita, 1989). Cognitive strategies can be defined as mental or behavioral activities that increase the likelihood of comprehension, such as rereading, activating prior background knowledge, and adjusting reading speed (Van Den Broek & Kremer, 2000). Metacognitive strategies can be specified as self-monitoring and regulating activities that focus on the product and the process of reading, support readers' awareness of comprehension, and assist in the selection of cognitive strategies as a function of text difficulty, situational constraints, and the reader's own cognitive abilities (Lories, Dardenne, & Yzerbyt, 1998; Van Den Broek & Kremer; Weisberg, 1988).
Unfortunately, there is no reason to believe that all elementary students spontaneously develop essential cognitive and metacognitive knowledge and skills (Hartman, 2001; Pressley & Allington, 1999). Research reviews, however, reveal that monitoring and regulating skills and effective application of relevant strategies can be taught (Dole et al., 1991 ; Pressley, 2000; Pressley et al., 1989). In this respect, recent studies (Baumann, Seifert-Kessell, & Jones, 1992; Block, 1993; Dole et al.; Dole, Brown, & Thrathen, 1996; Pearson & Fielding, 1991) and reports of the National Reading Council (U.S.; Snow et al., 1998) underscore the value of explicit cognitive and metacognitive reading strategy instruction, for "comprehension instruction takes the mystery out of the reading process, helping students assume control" (Raphael, 2000, p. 76). As to the practice of teaching reading, however, little has changed since Durkin's (1978-1979) observation research into comprehension instruction. The dominant instructional practice is still very traditional, characterized by questioning students about text content, with little explicit attention to the strategic aspects of processing and comprehending text (Aarnoutse, 1995; Paris & Oka, 1986; Pressley, Wharton-McDonald, Hampston, & Echevarria, 1998; Weterings & Aarnoutse, 1986).
In addition to the importance of explicit reading strategies instruction, research has revealed that the development of reading competence in the elementary grades can be encouraged by interaction with peers (Almasi, 1996; Fuchs, Fuchs, Mathes, & Simmons, 1997; Johnson-Glenberg, 2000; Mathes & Fuchs, 1994; Mathes, Torgesen, & Allor, 2001; Palincsar & Brown, 1984; Rosenshine & Meister, 1994; Simmons, Fuchs, Fuchs, Mathes, & Hodge, 1995). The traditional teacher-led interaction pattern of teacher question-student response- teacher evaluation seems insufficient to achieve an actual increase in comprehension, higher level cognition, and the application of self-regulation strategies (Cazden, 1986). Relying on the teacher's interpretive authority causes students to become passive learners. Conversely, to become self-regulated readers, students need to take an active role and to recognize and resolve their own discrepancies with texts (Almasi; Gourgey, 2001). Research has demonstrated that this kind of active reading behavior is promoted by providing students with opportunities to engage in peer-led interaction about texts. More particularly, it has been shown that, through discussions, peer conferences, peer tutoring, and cooperative activities, students implement, evaluate, and modify strategies, and discuss transfer of strategies (Klingner & Vaughn, 1996; Klingner, Vaughn, & Schumm, 1998; Palincsar & Brown, 1984). Moreover, discussions between peers provide opportunities for metacognitive exchanges and modeling (Palincsar, David, Winn, & Stevens, 1991). In this way, children's knowledge about reading and reading strategies, as well as their ability to apply relevant strategies, increases. Despite these convincing research results, student-centered discussion with regard to reading comprehension is anything but common practice in most classrooms (Alvermann, 2000).
In the present study, we attempted to narrow the gap between prevailing instructional practice and research evidence in the field of reading comprehension instruction. An innovative approach, blending research-based practices from the aforementioned research fields, was designed, implemented, and evaluated. More specifically, the innovations comprised two cornerstones: explicit reading strategies instruction and opportunities to practice strategic reading in peer-tutoring dyads. Peer tutoring was introduced to stimulate student interaction because of the opportunities it creates to practice metacognitive skills. It should be noted that studies of peer tutoring in reading comprehension and thinking skills are relatively rare (Topping, 2001). Following research on peer-assisted learning strategies (e.g., Fuchs, Fuchs, Mathes, et al., 1997), classwide peer tutoring (e.g., Greenwood, 1991; Greenwood, Carta, & Hall, 1988), and studies focusing on practicing reading strategies in small cooperative groups (e.g., Brown, Pressley, Van Meter, & Schuder, 1996; Palincsar & Brown, 1984; Pressley et al, 1992; Stevens, Madden, Slavin, & Famish, 1987; Stevens, Slavin, & Famish, 1991), the present study involved training in comprehension strategies rather than tutoring students in word-level oral reading or low-level comprehension activities.
Peer tutoring can be defined as "people from similar social groupings who are not professional teachers helping each other to learn, and learning themselves by teaching" (Topping, 1996, p. 322). This definition covers a series of practices, including peers as one- on-one teachers to provide individualized instruction, practice, repetition, and clarification of concepts (Topping, 1988; Utley & Mortweet, 1997). Peer tutoring is structurally embedded in the curriculum and classroom organization and is characterized by specific role taking: One person has the job of tutor, while the other is the tutee (Topping, 1996). Moreover, effective peer tutoring is characterized by a preceding tutor training (Bentz & Fuchs, 1996; Fuchs, Fuchs, Bentz, Phillips, & Hamlett, 1994; Fuchs, Fuchs, Hamlett, Phillips, Karns, & Dutka, 1997). With regard to the dyad composition, two variants can be distinguished. Cross-age tutoring refers to older students tutoring younger students; in same- age tutoring, children are paired with classmates. The variant in which students alternate regularly between the tutor and tutee role is called reciprocal same-age tutoring (Fantuzzo, King, & Heller, 1992).
Peer tutoring has been successful in a variety of curriculum areas and age groups. Research has indicated positive effects on academic achievement for both tutor and tutee (Cohen, Kulik, & Kulik, 1982; Fantuzzo, Davis, & Ginsburg, 1995; Fantuzzo, Polite, & Grayson, 1990; Fantuzzo et al., 1992; Greenwood et al, 1988; Mathes et al., 2001 \; Simmons et al., 1995). In this respect, peer tutoring is not only about transmission from the more able and experienced to the less able (Topping, 1996); tutors seem to benefit even more from tutoring than students who receive the individual tuition (Fitz-Gibbon, 1988; Greenwood et al.; Lambiotte et al., 1987). This can be explained by the nature of tutoring a peer: Tutors are challenged to consider the subject fully from different perspectives, to engage in active monitoring to identify and correct errors, to reorganize and clarify their own knowledge and understandings, and to elaborate on information in their explanations (Fuchs & Fuchs, 2000). Because the application of reading strategies requires actively monitoring the reading process, peer tutoring may be considered a powerful learning environment for the acquisition of reading comprehension skills. Monitoring the reading process of another reader might facilitate the acquisition of self-monitoring skills and, hence, the adequate application of reading strategies. From a theoretical perspective, consistent with Vygotsky's (1978) theory of socially mediated learning, the object of the dyadic interaction in the peertutoring activities is the joint construction of text meaning by appropriate application of relevant reading strategies to a wide range of texts and, in the long term, the internalization and consistently self-regulative flexible use of strategic processing whenever encountering texts that are challenging to comprehend.
Furthermore, positive effects also have been found on tutors' and tutees' social and emotional functioning, especially with regard to self-efficacy perceptions, self-concepts, social relationships, and attitudes toward the curriculum areas treated in the tutoring sessions (e.g., Cohen et al., 1982; Fantuzzo et al., 1992; Fantuzzo et al., 1995; Greenwood et al., 1988; Mathes & Fuchs, 1994). Regarding reading comprehension, self-efficacy is an especially important construct, given that attention to strategy instruction alone is not sufficient to produce maximum reading growth (Casteel, Isom, & Jordan, 2000). Affective factors result in deeper engagement with text, which translates into superior achievement. Henk and Melnick (1995) asserted that self-efficacy judgments can affect an individual's overall orientation to the process of reading; influence choice of activities; affect continued involvement, amount of effort expended during reading, and the degree of persistence in pursuing text comprehension; and ultimately affect achievement.
Our aim in the present intervention study was to design, implement, and evaluate complex sets of instructional interventions in authentic classrooms to enhance second and fifth graders' reading comprehension achievement and self-efficacy perceptions toward reading. The specific contribution of the present study is the focus on peer-tutoring variants as instructional techniques to practice the use of reading comprehension strategies. More specifically, we concentrated on an explicit comparison of the relative merit of practicing reading strategies in (a) teacher-led whole-class activities, (b) reciprocal same-age peer-tutoring activities, or (c) cross-age peer-tutoring activities within the same study for two different age groups. So far, cross- and same-age tutoring have not been compared within the same study, and there is only indirect reference material from the meta-analysis of Cohen and colleagues (1982) with regard to the differential impact. Furthermore, in the present study, we extend prior research by (a) sampling a larger number of participants than is typically the case in strategies- based comprehension studies; (b) supporting teachers to implement the innovations in the natural classroom context with the participation of all students of all abilities during an entire school year, which represents sensitivity to the interventions' ecological validity; (c) targeting students in the early and intermediate grades, populations that deserve more attention with regard to metacognitive and strategic behavior; (d) including long- term maintenance measures; (e) using standardized reading comprehension tests not directly linked to the treatment; and (f) applying multilevel modeling to take the hierarchical nesting of students in classes into account.
Based on a review of the research literature and the aforementioned lines of reasoning, we formulated the following hypotheses for the study:
Hypothesis 1. Explicit reading strategies instruction, followed by practice in teacher-led whole-class or peer-tutoring activities, enhances second and fifth graders' reading comprehension achievement more than traditional reading comprehension instruction.
Hypothesis 2. Practicing reading strategies in cross-age or reciprocal same-age peer-tutoring activities generates larger positive changes in second and fifth graders' comprehension achievement than more traditional teacher-led practice during whole- class activities.
Hypothesis 3. Improvement in reading comprehension is more obvious for second and fifth graders functioning as tutees and tutors, respectively, in cross-age peer-tutoring activities than for their peers alternating between the tutor and tutee roles in reciprocal same-age activities.
Hypothesis 4. Cross-age and reciprocal same-age peer-tutoring activities improve second and fifth graders' self-efficacy perceptions toward reading more than traditional teacher-led instructional techniques.
Hypothesis 5. Improvement in self-efficacy perceptions toward reading is more obvious for second and fifth graders functioning as tutees and tutors, respectively, in cross-age peer-tutoring activities than for their peers alternating roles in reciprocal same- age activities.
We used a pretest, posttest, and retention test control group design. To ensure the ecological validity of the interventions, we included complete naturally composed classes. Participating classes were assigned to one of four research conditions. In the strategies- only condition (STRAT), the experimental intervention included explicit reading strategies instruction, followed by practice in teacher-led whole-class settings. The experimental same-age (STRAT + SA) and cross-age (STRAT + CA) peer-tutoring conditions included identical instruction in the same strategies, combined with class- wide practice in reciprocal same-age or cross-age dyads, respectively. In this respect, students experienced either sameor cross-age tutoring. Finally, we included a control group, characterized by traditional reading comprehension activities without explicit strategies instruction or peer tutoring. Classes were randomly assigned to the STRAT or tutoring conditions. Within the tutoring conditions, teachers opted in favor of the STRAT + SA or STRAT + CA condition according to the readiness of a colleague to collaborate in the STRAT + CA activities. We selected control group classes to match the experimental teachers and classes. Because the classes were naturally composed and the assignment of classes to the conditions was not completely randomized, the design can be regarded as quasi-experimental.
In total, 444 second and 454 fifth graders from 44 classes in 25 different schools throughout Flanders (Belgium) participated in the study. Except for some small-scale initiatives of individual schools, peer tutoring was fairly unfamiliar at the time of the study. Other cooperative or interactive techniques, such as group work, group discussion, and circle time, were better known and more frequently used.
Except for one inner-city school in the STRAT condition with mainly a low socioeconomic status and ethnic minority population, all schools had a predominantly white, Flemish population. The majority of the children were from middleclass families. Except for one second-grade class including only girls, there was approximately an equal gender distribution: In second- and fifth-grade classes, on average, 53% (SD = 16.54) and 48% (SD = 18.55) of the students were boys. At the beginning of the school year, second graders were aged, on average, 7 years and 4 months, and fifth graders were aged, on average, 10 years and 5 months. The majority of the students (402 in second and 422 in fifth grade) were native Dutch speakers. Because elementary school students in Flanders are not grouped by ability, classes are considered academically heterogeneous, which was confirmed by the pretest reading comprehension measures. Class size ranged from 15 to 28 students, with an average of approximately 21 (SD = 3.50) in the second grade, and from 10 to 30 students in the fifth grade, with an average of approximately 22 (SD = 5.00) students per class. second- and fifth-grade teachers had, on average, 11 and 20 years of teaching experience, respectively. Four of 22 secondgrade and 5 of 22 fifth-grade teachers were men. None of the teachers had previous experience in explicit reading strategies instruction or peer tutoring.
We selected participating teachers from a group of approximately 100 secondand fifth-grade teachers who were willing to take part in a long-term research study. All interested teachers received a questionnaire concerning their teaching practices and opinions regarding learning and instruction. The first step in the teacher- selection procedure was based on this questionnaire. More specifically, we selected student-oriented teachers who were experienced in applying cooperative and interactive instructional techniques and able to build in differentiation according to pace or content. Furthermore, we based the selection on the geographical distribution of the schools throughout Flanders and on the possibility of matching teachers and classes with regard to teachers' teaching experience, beliefs, and instructional practice; class size; students' age; gender distribution; and dominating mother tongue. Table 1 s\hows the number of participating classes and students per condition.
In the present study, we used standardized tests to measure students' reading comprehension achievement and decoding fluency. We administered questionnaires with respect to reading attitude, perceived competence, and preoccupation with attributions and self- efficacy perceptions toward reading.
Reading comprehension tests. We measured reading comprehension achievement using Dutch standardized test batteries (Staphorsius & Krom, 1996; Verhoeven, 1993), which were selected based on the tests' well-established psychometric characteristics, the built-in adaptation to different student abilities, and the fact that the tests address aspects of comprehension covered by the strategies part of the experimental program. At each measurement occasion, we administered tests with an increasing level of difficulty.
TABLE 1. Number of Participating Classes and Students
The second-grade pretest contained six short stories, each followed by 5 multiple-choice questions asking for the meaning of a word, the meaning of a sentence, the referral relation between words, the connection between sentences, and the theme of a text. We determined the scores by the number of correct answers. The second- grade post- and retention tests consisted of four and three different stories, respectively, each followed by 4 to 10 multiple- choice questions, with a total of 25 questions per test. More specifically, questions concerning the content of a text (demanding a clear understanding of the meaning of words and sentences, the referral relation between words, the connection between sentences, and the theme of the text) and questions concerning the communication between the author and the reader of the text (e.g., objective of the author, intended target group, the author's attitude toward the matter raised) could be distinguished. Both types of questions required integration of information on different textual levels (words, sentences, paragraphs, text) and were more or less equally distributed over the 25 questions per text. After discussing an example, students completed the tests individually. To examine the tests' internal consistency, Cronbach's α coefficients were calculated on our own data, yielding high reliability scores of .90 (n = 432) for the pretest, .84 (n = 422) for the posttest, and .83 (n = 385) for the retention test.
In fifth grade, the tests consisted of three modules of 25 multiple-choice questions each. All students took the first module of the test. Depending on these first results, students further completed an easier or more difficult module. Two types of questions requiring the integration of information on different textual levels could be distinguished: questions concerning the content and questions concerning the communication between the author and the reader. After an example, students completed the tests individually. Scores were determined by summing the correct answers. For the reading comprehension test, IRT-modeled scores were available: Based on Item Response Theory (IRT), a common scale had been developed for different grades and test versions (easy-difficult), allowing us to compare the easier or more difficult part of the test. Because they are all on the same scale, the IRT-modeled scores also allow for direct comparison of the results a student obtained at different measurement occasions. To verify the reliability of the three modules of the pre-, post-, and retention tests, we computed Cronbach's α coefficients on our own data. Table 2 indicates that reliability of all comprehension measures was acceptable.
Decoding fluency test. We included second graders' decoding fluency, which is a combination of accuracy and decoding speed (Chard, Simmons, & Kameenui, 1998), as an additional variable, because fluency can be considered a mediating factor on students' reading comprehension achievement (Pressley, 2000). A standardized test (Brus, 1969) was administered individually to all second graders; students were asked to read unrelated words with an increasing level of difficulty during exactly 1 min. The score was determined by counting the number of words read correctly. We collected fluency data in second-grade classes only because it is recognized that reading fluency is generally well developed at the end of the third grade (Bast & Reitsma, 1998; Sticht & James, 1984) and because it was too time consuming to test all fifth graders individually as well.
Questionnaire on self-efficacy perceptions and related causal attributions. Within the framework of the present study, we developed a questionnaire to measure students' preoccupation with positive or negative thoughts or related causal attributions with regard to their reading ability. Inspired by the work of Ames (1984), we asked children to report how often such thoughts crossed their mind before, during, or after reading. Factor analysis revealed that success attributions and positive thoughts about one's own reading competence on the one hand and failure attributions and negative self-efficacy perceptions on the other hand are very closely related. This result is in line with the findings of Marsh (1984) and Marsh, Cairns, Relich, Barnes, and Debus (1984), who stated that self-attributions can be seen as expressions or indicators of one's self-concept or self-efficacy perceptions. Therefore, we constructed two scales reflecting negative and positive thoughts, respectively, about one's own reading abilities. It should be noted that capturing the incidence of self-efficacy- related thoughts does not give a direct measure of students' self- efficacy perception but rather indicates the degree to which a student is preoccupied with such thoughts. In this respect, the data are more directly related to (meta)cognitive activity than data collected by means of more traditional self-concept questionnaires. However, a high incidence of negative self-efficacy-related thoughts can be considered an indication of a low self-efficacy perception, but such a conclusion cannot be drawn from a low incidence of positive self-efficacy-related thoughts. The latter suggests only that the student is not preoccupied with thoughts about reading proficiency or success. We administered the questionnaire at each measurement occasion. Fifth graders read and completed the questionnaire individually. In second grade, all items were read out loud to and judged individually by the students.
TABLE 2. Cronbach's α Coefficients for the Fifth-Grade Reading Comprehension Tests
As can be seen in Table 3, reliability was high for the negative subscale, but it was somewhat lower for the positive subscale. To investigate the validity of the questionnaire, both scales were correlated with the scholastic competence subscale of a Dutch version of the Self-Perception Profile for Children (Harter, 1985). These analyses revealed that both positive and negative self- efficacy perceptions were significantly (p
TABLE 3. Cronbach's α Coefficients for the Questionnaire Concerning Preoccupation With Attributions and Self-Efficacy Perceptions
Perceived competence scale. Although we mainly focused on students' self-efficacy perceptions directly related to reading activities, we administered an existing self-concept questionnaire (Veerman, Straathof, Treffers, Van den Bergh, & ten Brink, 1997), which is a Dutch version of the Self-Perception Profile for Children (Harter, 1985). Because the questionnaire was not appropriate for second graders, we used the instrument with the fifth-grade group only. To verify the reliability of the different scales, we computed Cronbach's α coefficients. As can be seen in Table 4, the reliability of the measures was acceptable. As to the questionnaire's validity, Veerman and colleagues reported that, compared with other investigations into the validity of self-report scales, the validity can be judged as moderate.
Reading attitude scale. Both second and fifth graders completed a Dutch Reading Attitude Scale (Aarnoutse, 1996) at the pre- and posttest. Fifth graders read and completed the questionnaire individually. In second grade, all 27 items of the scale were read out loud to and completed individually by the students. Reading attitude can be unfolded in three elements: children's knowledge about and experience with reading, their positive or negative appreciation of reading and reading material, and their tendency to read. The Reading Attitude Scale refers to these different elements but focuses chiefly on the affective aspect. The score on the scale is the number of positive judgments on the 27 questions. To examine the internal consistency of the scale, we calculated Cronbach's α coefficients for both second and fifth graders, respectively, yielding high reliability scores of r = .84 (n = 334) and r = .94 (n = 368) for the pretest and r = .92 (n = 353) and r = .92 (n = 395) for the posttest. As to the validity of the Reading Attitude Scale, Aarnoutse reported that research results have supported the expectation that the questionnaire measures what it professes to measure.
TABLE 4. Cronbach's α Coefficients for the Fifth-Grade CBSK Scales
STRAT + SA and STRAT + CA condition. The experimental interventions in the STRAT + SA and STRAT + CA conditions are characterized by explicit strategies instruction, a sound tutor preparation, and practice of the reading strategies in weekly peer- tutoring sessions.
Explicit instruction in reading strategies: Empirical research results have identified a large variety of relevant strategies; however, it is impossible to address all or too ma\ny strategies in one intervention study. Analyses of proficient readers' reading behavior has revealed that skilled reading does not involve the use of a single potent strategy but the coordination of multiple strategies (Brown et al., 1996). In this respect, it was necessary to make a considerate selection of feasible strategies for both second and fifth graders. The compilation of the reading strategies was inspired by contemporary reading research and recurrent strategies in explicit strategy instruction programs (e.g., Brown et al.; De Corte, Verschaffel, & Van De Yen, 2001; Fuchs, Fuchs, Mathes, et al., 1997; Fukkink, Van der Linden, Vosse, & Vaessen, 1997; Klingner & Vaughn, 1996; Palincsar & Brown, 1984). We selected six essential strategies: (a) activating prior knowledge and connecting it to the text, (b) predictive reading and checking story outcomes, (c) distinguishing main issues from side issues, (d) monitoring and regulating the understanding of words and expressions, (e) monitoring and regulating comprehension by tracing the ideas expressed in difficult and not-understood sentences or passages, and (f) classifying types of text and adjusting reading behavior to it.
Activating prior knowledge, which is generally considered as crucial for comprehension (Palincsar & Brown, 1984; Paris & Oka, 1986; Pressley et al., 1992), involves that students evoke what they already know about the topic of the text before they start reading. Within the scope of the intervention, we instructed students to infer the topic of the text on the basis of the title and accompanying illustrations, ask themselves what they already know about that topic, and note this down in a number of key words. Regarding the second reading strategy, students were taught to make predictions and to compare these predictions to the actual story outcomes. In this way, students were stimulated to look ahead while reading and to verify expectations afterwards. Distinguishing main issues from side issues targets the skills of main idea identification and summarizing. In the present study, students more specifically learned to ask and answer who, what, where, when, and why questions in reference to the text and to restate the main ideas of successive paragraphs or the full text. Regarding monitoring and regulating the understanding of difficult words and expressions, students were taught to identify not-understood words or expressions and to discover the meaning by looking for a definition, a synonym, or a description in the text, by deriving clarification from the context, by referring to a dictionary or computer, or by enlisting someone's help. On the basis of the fifth strategy, students were encouraged to monitor comprehension and regulate understanding of difficult sentences or passages by rereading, adjusting reading speed, or tracing the meaning of unfamiliar words or expressions. Finally, students were alerted to the fact that different types of texts, each with its own characteristics, can be distinguished. They were instructed in classifying types of text and adjusting their reading behavior to them.
To standardize the explicit strategies instruction, we provided teachers with student materials and elaborated lesson scenarios. The materials for the students included a selection of texts to practice the strategies, as well as strategy assignment cards, offering structure and visual support during the reading process by displaying step-by-step how to employ the reading strategies. The texts were not especially written or adapted for the experimental treatment but were selected meticulously from children's literature, matching the students' different reading levels and interests. In designing the lesson scenarios for the explicit strategies instruction, we included components of transactional strategies instruction (Brown et al., 1996; Pressley et al., 1992) and reciprocal teaching (Palincsar & Brown, 1984). For each strategy, we designed a teaching cycle in which three phases could be distinguished, reflecting a gradual transfer from external teacher regulation to self-regulation of strategy use by the students. During a first phase of whole-class instructional presentation, much attention was paid to extensive and direct teacher explanations of each strategy and modeling of strategic reasoning. By using the think-aloud methodology, the teacher explicitly explained and modeled why, how, and when a specific strategy can be helpful in enhancing comprehension. In this respect, teachers imparted declarative (What is the nature of the strategy?), procedural (How to deploy it?), and conditional (When to use it?) knowledge (Paris, Lipson, & Wixson, 1983) of the strategies to the students. During a second phase of practice and coaching using multiple examples, the teacher put the strategies into practice together with the students. This phase was characterized by student assignments, systematic and explicit scaffolding and coaching by the teacher to engage students in applying and reflecting on the strategies, and subsequent whole- class discussions. Moreover, the teacher cycled back to modeling and re-explanations as needed. In a third phase, the teacher applied the assignment cards more independently to systematize and internalize strategy use. In the STRAT + CA and STRAT + SA conditions, respectively, independent practice took place in cross-age and reciprocal same-age peer-tutoring dyads. With regard to the instruction of the strategies, a "sandwich model" was applied in that the six strategies were not introduced and practiced simultaneously but were phased in at different times throughout the whole school year. The teacher-led lessons in which a new reading strategy was introduced and practiced together with the students was followed by the independent practice of that strategy in different peer-tutoring sessions. For the students in the peer-tutoring conditions, this practice involved at least one tutoring session in which the strategy was practiced when reading one of the selected texts from the manual. Thereafter, a number of tutoring sessions followed in which the strategy was practiced when reading books or texts students choose themselves from the school or class library. Only after sufficient practice of a strategy did teachers start a new cycle with the introduction and practice of another strategy. At the end of the school year, after all strategies had been introduced and practiced in isolation, students were encouraged to select, apply, and evaluate a repertoire of relevant strategies and according assignment cards, depending on the difficulties they met in the texts.
Regarding the explicit reading strategies instruction, the key elements in the present study were (a) instructing and practicing a repertoire of reading strategies instead of focusing on one strategy; (b) phasing in the strategies gradually throughout the school year; (c) introducing each strategy and practice in isolation in three steps, representing a transfer from teacher regulation to students' selfregulation (explicit teacher explanations and modeling by thinking aloud, practice characterized by teachers' scaffolding and coaching, and more independent practice to internalize strategy use); (d) following the strategies with a period of practicing the six strategies as a repertoire; and (e) the teacher recursively cycling back to modeling and re-explanations during each phase of the strategies' introduction or practice.
Tutor preparation: To train students to be a good tutor, we developed a series of lessons and materials based on research and tutoring programs (Bentz & Fuchs, 1996; Fukkink et al., 1997). Lesson scenarios and student materials were included in the STRAT + CA and STRAT + SA teachers' manual and comprised worksheets and instructions for role-play. The preparatory lessons were scheduled at the start of the intervention and required seven 50-min sessions. Tutors more particularly got acquainted with their tasks and responsibilities and learned how to show interest, how to initiate and finish a session, how to give corrective feedback, how to provide praise, and how to offer explanations and assistance.
Peer-tutoring sessions: In the STRAT + CA condition, fifth-grade tutors were paired with second-grade tutees. The teachers assigned children to the dyads. In addition to children's personality, dyad composition was based on reading ability so that poor and good fifth- grade readers were paired with poor and good second-grade readers, respectively. In the STRAT + SA condition, second and fifth graders were paired with classmates. Teachers assigned students to academically heterogeneous and socially compatible pairs, following the procedure of Fuchs, Fuchs, Mathes, and colleagues (1997). More specifically, teachers paired all students in their class by ranking them on reading performance and then splitting the ranked list in half. The top-ranked student in the stronger half was paired with the strongest reader in the weaker half. Next, second-ranked students in each half were paired. This matching process continued until all students had a partner. Teachers were then advised to inspect the pairings to determine whether one or more were socially incompatible. If such a dyad was found, it was changed (Fuchs, Fuchs, Mathes, et al.). Taking into account that same-age partners engaging in role reciprocity make greater reading gains than partners who do not (Simmons, Fuchs, Fuchs, Hodge, & Mathes, 1994), students in the sameage dyads alternated regularly between tutee and tutor roles, so that each served as tutor for an equal amount of time. In principle, dyads remained together for the duration of the school year. However, pairs were changed if they turned out to be socially incompatible.
Peer tutoring was organized once (50 min) or twice (25 min each) a week, depending on the task and scheduling. Taking into account p\rior research results revealing that a combination of dyadic peer interaction and structured academic activity does more to enhance cognitive gain than either of the two dimensions separately (Cohen et al., 1982; Fantuzzo, Riggio, Connelly, & Dimeff, 1989; Lambiotte et al., 1987), teachers used the aforementioned strategy assignment cards as a vehicle for structuring the tutoring interaction. Prior to each peer-tutoring session, teachers gave a short briefing to point out the main ideas discussed in the introductory lessons. During the actual peer tutoring, teachers were present and circulated around the classroom to observe and coach the reading dyads. Finally, all tutoring sessions were followed by a reflection period to discuss students' experiences and teachers' observations. During the briefing, the coaching of the dyads, and reflection periods, teachers cycled back to modeling and re-explaining the reading strategies as needed.
In conclusion, both the cross-age and same-age tutoring activities were characterized by the following research-based effective components: (a) preceding tutor preparation; (b) weekly sessions, structurally embedded in the curriculum and classroom organization; (c) structured and task-centered interaction in the tutoring dyads supported by strategy assignment cards; and (d) teacher support and coaching, cycling back to modeling and re- explanations of reading strategy application and tutor skills. The difference between both conditions pertained solely to the composition of the reading dyads.
STRAT condition. The STRAT condition was typified by explicit strategy instruction, identical to the STRAT + SA and STRAT + CA conditions. Teachers were not only provided with a manual including the same lesson scenarios, assignment cards, and texts, they also built in the gradual transfer of responsibility from teacher to students. Practice to systematize strategy use did not occur in tutoring dyads, however, but was characterized by individual seatwork, teacher-led discussions, and no highly interactive peer- mediated instructional techniques.
Control condition. Control group teachers conducted reading instruction in their traditional way, which was without explicit instruction or peer tutoring. Interviews with the teachers more specifically revealed that the lessons typically involved teacher- led whole-class activities, including comprehension-check questions after reading a text, teacher evaluation of students' answers, and presentation of correct answers. We told control group teachers that the purpose of the study was to examine elementary school students' development in the field of reading comprehension achievement and did not inform them that they were part of a control condition.
Support to teachers. Because regular classroom teachers put into practice the experimental interventions, teachers in all three experimental conditions were supported by means of scripted introductory lesson scenarios and regular inservice training sessions to limit variability in treatment fidelity. We provided teachers with an elaborate manual including all materials necessary to conduct the innovation, namely (a) a description of the interventions' rationale and organization; (b) lesson scenarios describing the objectives, materials, and successive phases of each lesson; and (c) additional student materials, such as assignment cards and texts. Teachers were not required to develop additional materials. Moreover, we provided teachers with inservice training and coaching. Prior to the interventions in the classes, we clarified the underlying theoretical background, outlined an overview and the organization of the interventions, and fully discussed the provided manual and additional materials during meetings in the schools. Subsequently, after the start of the intervention, monthly discussions took place to exchange experiences and ideas and to overcome practical or implementation difficulties.
We conducted two structured interviews and frequent observations to document the implementation fidelity. From those data, we concluded that a satisfactory fidelity was reached in all experimental classrooms. All introductory lessons with regard to tutor preparation and the introduction of reading strategies were taught according to the teachers' manual and more than the minimum expected number of tutoring sessions were organized.
Time spent on reading comprehension instruction. The experimental treatments were not meant as an additional program on top of teachers' traditional reading comprehension classes. We coached experimental teachers to substitute their habitual way of comprehension instruction with one of the experimental programs and encouraged them to implement the treatments during the time normally allocated for reading instruction. No differences between the four conditions were intended with regard to the total amount of time spent on reading comprehension instruction and practice. The structured interviews with the experimental teachers yielded information on the total amount of time spent on reading comprehension instruction on a weekly and yearly basis. Similar information was obtained from all control group teachers. One-way analyses of variance (ANOVAs) on these data revealed that the different conditions in both second, F(3, 21) = 1.188, p = 0.342, and fifth grade, F(3, 21) = 0.348, p = 0.791, allocated commensurable amounts of time to comprehension instruction. Post hoc analyses produced no mutual significant differences between the conditions.
Implementation of the experimental interventions was spread out over an entire school year and was conducted with all students during regularly scheduled reading instruction.
We collected data under supervision of the first author within the regular classroom context and during regularly scheduled class sessions. All conditions were measured at three times: a pretest in October (second and fifth grades) before the start of the intervention, a posttest in May or June (second and fifth grades) after the completion of the intervention, and a retention test in December of the school year that followed the intervention year (third and sixth grades). In accordance with the planning of the study, none of the third- and sixth-grade teachers pursued the experimental intervention.
In this study, students were nested within a smaller number of classes; therefore, it can be argued that the problem under investigation had a clear multilevel structure. A consequence of the hierarchical structure is that the observations of individual students are generally not completely independent because of the common history and experiences children share by belonging to the same class (Hox, 1994). In this respect, the use of hierarchical or multilevel models is recommended because these techniques-in contrast to the traditional ordinary least squares regression analysis-take interdependency explicitly into account (Bryk & Raudenbush, 1992; Hox & Kreft, 1994). The application of these models results in more efficient estimates of regression coefficients and more correct standard errors, confidence intervals, and significance tests, which generally are more conservative than the traditional ones obtained by ignoring the presence of clustering (Goldstein, 1995). We distinguished two levels in this study: Students (Level 1) were clustered within classes (Level 2). Because the number of classes within schools ranged from one to two per grade, we did not include an additional school level. Classroom rather than school was preferred as the highest level because the experimental intervention took place at the classroom level.
With a view to build the most appropriate models and to test the research hypotheses, we constructed models from a null model to a model including relevant explanatory variables, whereupon we explored the effects attributable to the experimental interventions. The first step in the analyses was to examine the results of an unconditional two-level null model, with only an intercept term included (Model 0). The null model permits partitioning the total variance into withinclass and between-class components. It serves as a baseline with which to compare subsequent more complex models and is unconditional because the variance components are not predicted by any variables.
The second step in the construction of the models concerned the inclusion of conceivably relevant explanatory variables, such as students' pretest measures (Model 1), additional background characteristics, and the class-level total amount of time spent on comprehension instruction (Model 2). Because parsimonious models are preferred, we retained only significant predictors ameliorating the model. To test the research hypotheses, the third and final step in the analyses was to examine the effects attributable to the experimental interventions by adding the categorical variable "condition" to the model, controlling for the relevant predictors (Model 3). To represent the four research conditions, we used three dummy variables, with the control group as the reference category. Because entire classrooms were assigned to the experimental conditions, the three dummy variables were at the classroom level.
We did separate analyses for second and fifth graders. To facilitate the interpretation of the estimates, we calculated standardized scores and included them in the models. We used the iterative generalized least squares estimation procedure of the software MlwiN (Rasbash et al., 1999) to estimate the parameters of the model.
Generally, we found significant intervention effects with regard to second and fifth graders' reading comprehension achievement, as well as with regard to fifth graders' self-efficacy-related thoughts toward reading. No significant findings were discerned in students' self-concept and attitude toward reading.
Effects onReading Comprehension
Tables 5 to 8 present the results concerning reading comprehension achievement for second and fifth graders' post- and retention tests. The random part of Model 0 provides justification for applying multilevel models, given that the variances at both the class and student level were significantly different from zero. Respectively, 11% and 15% of the total variance in second graders' postand retention test scores were related to differences between classes. For fifth graders, this was, respectively, 13% and 20%. As standardized scores were included in the analysis, the null models' intercepts, which represent the overall mean score of all students in all classes, were not significantly different from zero.
As Model 1 in Tables 5 to 8 reveals, students' pretest comprehension achievement was an important significant predictor of both second and fifth graders' post- and retention test scores, with effect sizes ranging from 0.59 to 0.76 standard deviation. For second graders, the random part estimates revealed that the great majority of the post- and retention test differences between classes (respectively, 86% and 71%) and between students (respectively, 43% and 33%) were accounted for by pretest differences. We found a similar result for fifth graders. Adding fifth-grade students' pretest scores to the models explained an important part of the posttest and retention test differences between classes (respectively, 61% and 44%) and between students (respectively, 61% and 48%). As it was possible that the Level 1 and 2 variances differed according to students' pretest measures, we allowed this predictor's parameter estimates to vary randomly across all classes and students. Only for fifth graders' posttest scores did the pretest parameter estimate show complex variance, indicating that the differences in posttest comprehension achievement between fifth graders within a class expanded as students performed better on the pretest. In other words, the higher the pretest scores, the less predictable the posttest scores in fifth grade.
Apart from students' pretest comprehension scores, a number of other variables appeared to be significant predictors of second and fifth graders' post- and retention test scores. Model 2 in Tables 5 and 6 indicates that second graders' pretest decoding fluency was positively related to reading comprehension, whereas the number of years children were behind at school and the pretest measure of preoccupation with failure attributions toward reading were negatively connected to comprehension scores. Moreover, second- grade boys appeared to perform significantly lower than girls. Model 2 in Table 6 further indicates that second-grade non-native speakers performed and progressed significantly less than native Dutch- speaking children.
With regard to fifth graders' post- and retention test scores, the fixed part estimates of Model 2 in Tables 7 and 8 indicated that, apart from pretest comprehension achievement, higher pretest measures of personal attitude toward reading and perceived scholastic competence went together with significantly higher post- and retention test scores. With regard to the explanatory variables for the posttest scores, it has to be mentioned that the influence of the average pretest reading attitude per class was not statistically significant in Model 2. However, after adding the dummy variables representing the experimental conditions (Model 3), this variable did have a significant effect. Therefore, we added the variable to the model.
TABLE 5. Model Estimates for the Two-Level Analyses of the Second- Grade Posttest Reading Comprehension Scores
TABLE 6. Model Estimates for the Two-Level Analyses of the Second- Grade Retention Test Reading Comprehension Scores
TABLE 7. Model Estimates for the Two-Level Analyses of the Fifth- Grade Posttest Reading Comprehension Scores
TABLE 8. Model Estimates for the Two-Level Analyses of the Fifth- Grade Retention Test Reading Comprehension Scores
Similar to the analyses for Model 1, we examined the assumption of complex variances for the estimates of all significant predictors of second and fifth graders' scores. Only for the estimates of the effects of pretest decoding fluency did the analyses indicate complex Level 1 variance in second graders' fluency, implying fewer within-class differences for post- and retention test comprehension achievement as students performed better on the fluency test.
As to the effects of the intervention, the parameter estimates of Model 3 in Table 5 reveal significant effects for the STRAT and STRAT + CA conditions, indicating that, by the end of the school year, second-grade students in these conditions had made significantly more progress in reading comprehension as compared with children in the control group. Effect sizes were 0.23 and 0.22 standard deviation, respectively. We found no significant effects for the STRAT + SA condition. Pairwise comparisons revealed no significant differences between the three experimental conditions. Furthermore, no interaction effects occurred on the posttest between condition and gender, nor between condition and initial performance level. These results imply that the significant positive effects of the STRAT and STRAT + CA conditions were equally strong for boys and girls and for initially low-scoring children and for high achievers. Contrary to the posttest results, we found no significant differences between the experimental conditions and the control group for second graders' retention test scores (see Table 6, Model 3).
Regarding fifth graders' posttest scores, Model 3 in Table 7 demonstrates significant positive effects for all experimental conditions with effect sizes ranging from 0.32 to 0.39 standard deviation, indicating that the comprehension growth produced by the experimental interventions was considerably higher than the growth observed in the control group. Pairwise comparisons revealed no mutual significant differences between the experimental conditions. Furthermore, we found no interaction effects between condition and students' initial performance level, implying that the significant positive effect of the experimental conditions was equally strong for initially low-scoring children as compared with high achievers.
The most striking result was found regarding fifth graders' retention test scores (Table 8, Model 3), with a quite large significant effect of the STRAT + CA condition (effect size = 0.60 SD). This result indicates that fifth graders who acted as tutors for second-grade students continued growing more than the control group students, even after the experimental intervention ended. Students in the STRAT condition also reached significantly higher scores on the retention test (effect size = 0.47 SD). We found no significant effect for the STRAT + SA condition. Pairwise comparison of the estimates of the experimental conditions revealed that the STRAT + CA condition outperformed the STRAT + SA condition (effect size = 0.29 SD). The difference was only marginally significant, however. Furthermore, no interaction effects occurred on the retention test between condition and initial performance level, implying equally strong effects for initially low and high achievers.
Effects on Preoccupation With Attributions and Self-Efficacy Thoughts
Regarding second graders' preoccupation with thoughts relating to self-efficacy in reading or success and failure attributions toward reading, we found no statistically significant effects for any of the experimental conditions.
For fifth graders, we found an effect of the experimental interventions on thoughts relating to failure attributions and negative self-efficacy perceptions toward reading. Tables 9 and 10 summarize the construction of the most appropriate model for the post- and retention test measures, respectively. The analyses of the two-level null models (Model O) showed that only the variance at Level 1 was significantly different from zero at both the post- and retention tests, indicating that the total variance of the dependent variables can be explained by individual differences between students but not by differences between classes.
As Model 1 in Tables 9 and 10 shows, children's pretest preoccupation with failure attributions and negative self-efficacy perceptions was a significant predictor of both post- and retention test measures, with effect sizes of 0.64 and 0.58 standard deviation. Including the pretest measure in the model explained 43% and 33% of the variance between students at the post- and retention tests, respectively. These results indicate that having negative thoughts about one's own reading abilities and associated failure attributions were quite persistent over time. Furthermore, the complex variance at Level 1 for the pretest measure indicated that the differences in post- and retention test reports between fifth graders within classes were larger for students who reported more negative thoughts during reading at the beginning of the school year.
Regarding other relevant explanatory variables, Model 2 in Tables 9 and 10 shows a rather small but significant (p
Model 3 in Table 9 reveals that, by the end of the school year, children in the STRAT + CA condition showed a significantly larger relative decrease in being occupied with failure attributions and negative self-efficacy perceptions during reading. This decrease was not only larger when compared with the control group (effect size = 0.31 SD) but also when compared with the STRAT + SA condition (effect size = 0.29 SD). As can be seen in Table 10\, the significant effect of the STRAT + CA condition compared with the control group continued to exist at the retention test. Remarkably, at the retention test, we observed a significant positive effect (p
TABLE 9. Model Estimates for the Two-Level Analyses of the Fifth- Grade Posttest Measures of Preoccupation With Failure Attributions and Negative Self-Efficacy Perceptions During Reading
TABLE 10. Model Estimates for the Two-Level Analyses of the Fifth- Grade Retention Test Measures of Preoccupation With Failure Attributions and Negative Self-Efficacy Perceptions During Reading
Our major aim in the present intervention study was to evaluate the effectiveness of explicit reading strategies instruction and the surplus value of peer tutoring as tools to enhance second and fifth graders' reading comprehension achievement and self-efficacy judgments. We used a pretest-posttest retention test design including three experimental conditions and a comparable control condition. The experimental groups were typified by explicit instruction in six reading strategies, followed by practice in teacher-led whole-class activities, in studentled cross-age peer- tutoring activities, or in reciprocal same-age peer-tutoring activities. In the control condition, we applied a traditional reading comprehension approach characterized by content-specific questions asked by the teacher. The study was situated in the challenging context of intact classes, providing a natural setting for the interventions' implementation. Taking into account this quasiexperimental nature of the study, we used multilevel models to allow for the nesting of students within classes. In interpreting the intervention effects and their effect sizes, one should bear in mind that the application of these models results in more conservative significance tests than the ones obtained by traditional analyses that ignore the presence of clustering (Goldstein, 1995). In this respect, the reported effect sizes between 0.32 and 0.60 standard deviation can be considered as large enough to merit attention to the particular intervention.
Effects on Reading Comprehension
Second graders' comprehension achievement results show that explicit strategies instruction created a significant extra learning gain of approximately onequarter of a standard deviation. It made no difference, however, whether reading strategies were practiced under direct teacher supervision or in a cross-age setting with well- prepared fifth graders as tutors. Compared with the control condition, both conditions created the same learning gain. Interestingly, poor readers made as much progress as high achievers. In the long term, 6 months after the end of the intervention, the effect of both the STRAT + CA and STRAT condition disappeared. Apparently, at this age long-lasting effects can be obtained only by continuing the intervention. Future research should try to elucidate this assumption.
Second graders practicing reading strategies in reciprocal same- age peer-tutoring dyads did not make extra learning gains compared with the control group. This finding parallels Rosenshine and Meister's (1994) review of reciprocal teaching research, revealing significant effects from Grade 4 through adult education and nonsignificant effects for younger students. It contrasts, however, with other research that applied different age groups, including second graders (Fuchs, Fuchs, Mathes, et al., 1997; Simmons et al., 1995). Regarding those studies, however, it should be noted that the interaction with grade level was not analyzed. The nonsignificant effect of the STRAT + SA condition leads us to the cautious supposition that the surplus value of explicit strategies instruction, as found for the STRAT condition, was counteracted by practicing the strategies in second-grade same-age dyads. Subsequent research is necessary to verify whether the preparatory tutor training was unsatisfactory for this age group; whether the selected strategies were too difficult to practice independently in second- grade reciprocal same-age dyads; whether STRAT + SA second graders needed a larger period to internalize the social, cognitive, and metacognitive demands of practicing reading strategies with classmates; or whether same-age peer tutoring is not an appropriate instructional technique for this age group. Future research should inquire as to how various components of the experimental interventions might be more or less appropriate from a developmental perspective.
For fifth graders, all experimental conditions appeared to create nearly equally large extra learning gains by the end of the school year, with effect sizes between 0.32 and 0.39 standard deviation. Moreover, the interventions appeared to be as effective for poor as for high achievers. These results are congruent with previous research confirming the positive effect of explicit strategies instruction on reading comprehension achievement (e.g., Baumann et al., 1992; Block, 1993; Dole et al., 1991; Dole et al., 1996; Pressley et al., 1989). Furthermore, the differential gains for the experimental students are comparable with those of PaIincsar and Brown's (1984) reciprocal teaching. In their review of reciprocal teaching, Rosenshine and Meister (1994) reported a median effect size of 0.32 when standardized tests were used. The similar effect sizes of the three experimental conditions in the present research lead us to suspect that the way th