Mathematical Word Problem Solving in Third-Grade Classrooms

June 14, 2007

By Jitendra, Asha K Sczesniak, Edward; Griffin, Cynthia C; Deatline- Buchman, Andria

ABSTRACT The authors conducted design or classroom experiments (R. Gersten, S. Baker, & J. W. Lloyd, 2000) at 2 sites (Pennsylvania and Florida) to test the effectiveness of schema-based instruction (SBI) prior to conducting formal experimental studies. Results of Study 1 conducted in 2 3rd-grade, low-ability classrooms and 1 special education classroom indicated mean score improvements from pretest to posttest on word problem solving and computation fluency measures. In addition, student perceptions of SBI according to a strategy satisfaction questionnaire revealed SBI as effective in helping solve word problems. Results of Study 2, which included a heterogeneous (high-, average-, and low-achieving) sample of 3rd graders, also revealed student improvement on the word problem solving and computation fluency measures. However, the outcomes were not as positive in Study 2 as in Study 1. Lessons learned from the 2 studies are discussed with regard to teaching and learning mathematical word problem solving for different groups of students. Keywords: elementary grade students, mathematics instruction, schema- based instruction, word problem solving

Current education reforms support challenging learning standards and school accountability. In mathematics education, the emphasis is on the development of conceptual understanding and reasoning over memorization and rote learning (Goldsmith & Mark, 1999; Hiebert et al., 1996; National Research Council, 2001). Mathematical problem solving is a central theme in the Principles and Standards for School Mathematics (National Council of Teachers of Mathematics [NCTM], 2000). Although the current emphasis is on solving complex authentic problems situated in everyday contexts, story problems that range from simple to complex represent “the most common form of problem-solving” assignment in school mathematics curricula (Jonassen, 2003, p. 267).

Story problems pose difficulties for many elementary students because of the complexity of the solution process (Jonassen, 2003; Lucangeli, Tressoldi, & Cendron, 1998; Schurter, 2002). Mathematics tasks that involve story-context problems are much more challenging than are no-context problems (Cummins, Kintsch, Reusser, & Weimer, 1988; Mayer, Lewis, & Hegarty, 1992; Nathan, Long, & Alibali, 2002). Although children may know procedures for solving no-context problems, solving story problems requires them to integrate several cognitive processes that are difficult for children with an insufficient knowledge base or limited working memory capacity. When solving story problems, for example, children need to (a) understand the language and factual information in the problem, (b) translate the problem with relevant information to create an adequate mental representation, (c) devise and monitor a solution plan, and (d) execute adequate procedural calculations (Desoete, Roey-ers, & De Clercq, 2003; Mayer, 1999). In short, solving word problems relates closely to comprehension of the relations and goals within the problem (e.g., Briars & Larkin, 1984; Cummins et al.; De Corte, Verschaffel, & De Win, 1985; Kintsch & Greeno, 1985; Riley, Greeno, & Heller, 1983).

Story problems are critical for helping children connect different meanings, interpretations, and relationships to mathematics operations (Van de Walle, 2004). How then do educators enhance students’ mathematical word problem solving skill? Traditional textbook problem-solving instruction has not effectively improved the learning of students at risk for mathematics difficulties. Many mathematics textbooks are organized so that the same procedure (e.g., subtraction) is used to solve all problems on a page. As a result, students do not have the opportunity to discriminate among problems that require different solution strategies. Furthermore, traditional instruction teaches students to use keywords (e.g., in all suggests addition, left suggests subtraction, share suggests division; Lester, Garofalo, & Kroll, 1989, p. 84) that are limiting. According to Van de Walle, “Key words are misleading,”"Many problems do not have key words,” and “Key words send a terribly wrong message about doing math” (p. 152). Thus, this approach ignores the meaning and structure of the problem and fails to develop reasoning and making sense of problem situations (Van de Walle).

In contrast, current mathematics reform efforts advocate that teachers act as facilitators to help students construct their own understandings of mathematical concepts and relationships (e.g., developing the abilities of inquiry, problem solving, and mathematics connections). However, the literature is inconclusive about the benefits of that approach for all learners (Baxter, Woodward, Voorhies, & Wong, 2002). As the range of skills that students bring to the classroom increases, the challenge for teachers is synthesizing knowledge regarding mathematics content and processes, student learning, effective instruction models, and appropriate classroom opportunities and experiences (Ma, 1999). Learners who lack sufficient prior knowledge may need more supportive instructional strategies in which the teacher scaffolds information processing by supplying a greater degree of instructional facilitation during the learning process.

To promote elementary students’ mathematical problem-solving skills, we relied on models for understanding and assessing children’s solutions for addition and subtraction word problems derived from schema theories of cognitive psychology (Briars & Larkin, 1984; Carpenter & Moser, 1984; Kintsch & Greeno, 1985; Riley, Greeno, & Heller, 1983). On the basis of those models, we designed an intervention (i.e., schema-based instruction [SBI]) for students with high-incidence disabilities and for those at risk for failure in mathematics (e.g., Jitendra et al., 1998; Jitendra, Hoff, & Beck, 1999). Contrary to earlier investigations of SBI that provided individual or small-group instruction by researchers in rooms adjacent to the students’ classrooms, we deemed it critical to broaden the learning environment to a typical classroom context wherein the classroom teacher provided all instruction. In addition, we added self-monitoring to strategy use because teaching students to self-regulate their learning has an added positive effect on their mathematics problem-solving performance (L. S. Fuchs et al., 2003; Schunk, 1998; Verschaffel et al., 1999).

Theoretical Framework

Results of a number of research studies on solving addition and subtraction word problems have shown that semantic or mathematical structure of problems (i.e., specific characteristics of the problem and the semantic relationships among the various problem features) is much more relevant than is syntax (i.e., how a problem is worded; Carpenter, Hiebert & Moser, 1983; Carpenter & Moser, 1984). Notwithstanding the ease or difficulty of the syntactic structure of problems, students who lack a well-developed semantic structure use a “bottom-up or text-driven approach to comprehend a problem statement” (Kameenui & Griffin, 1989, p. 581). Given the critical role of semantic structure in problem solution, Carpenter and Moser postulated a classification of addition and subtraction word- problem types that include change, combine, compare, and equalize. Of those problem types, change, combine, and compare problems are characteristic of most addition and subtraction word problems presented in elementary mathematics textbooks, indicating the need for research focusing on these problem types. Table 1 shows the three different problem types and their characteristics. Change problems usually begin with an initial quantity and a direct or implied action that causes either an increase or decrease in that quantity. The three sets of information in a change problem are the beginning, change, and ending. In the change situation, the object identities (e.g., video games) for beginning, change, and ending are the same (see Figure 1). In contrast, combine or group problems involve two distinct groups or subsets that combine to form a new group or set. Group problems require an understanding of part-part- whole relations. The relation between a particular set and its two distinct subsets is static (i.e., no action is implied). Compare problems involve the comparison of two disjoint sets (compared and referent); the emphasis is on the static relation between the two sets (see Table 1). The three sets of information in a compare problem are the compared, referent, and difference. For each problem type, there are three items of information; the position of the unknown in these problems may be any one of the three items, which can be found if the other two items are given.

Also critical to successful mathematics problem solving is domain- specific knowledge (conceptual and procedural; e.g., Hegarty, Mayer, & Monk, 1995). An important aspect of domain-specific concept knowledge is problem comprehension/representation, which involves translating the text of the problem into a semantic representation on the basis of an understanding of the problem structure. Although procedural knowledge (e.g., knowing the series of sequential steps used to solve routine mathematical tasks, such as adding 27 + 19 as well as knowing mathematical symbolism, such as +, =, and >) is also important, it “is extremely limited unless it is connected to a conceptual knowledge base” (Prawat, 1989, p. 10). In fact, a lack of understanding of the problem situation may result in a solution plan that is immaturely developed. Successful problem solvers can translate and integrate information in the problem into a coherent mental representation that mediates problem solution (Mayer, 1999; Mayer & Hegarty, 1996). However, many students have difficulty with problem comprehension and solution and would benefit from instruction in constructing a model to represent the situation in the text, followed by solution planning based on the model (Hegarty et al., 1995). Evidently, schema-based instruction (SBI), with its emphasis on semantic structure and problem representation, is one solution to advancing students’ mathematical problem-solving skills. The goal of SBI is to help students establish and expand on the domain knowledge in which schemata are the central focus. A schema is a general description of a group of problems that share a common underlying structure requiring similar solutions (Chen, 1999; Gick & Holyoak, 1983). According to Marshall (1995), schemata “capture both the patterns of relationships as well as their linkages to operations” (Marshall, 1995, p. 67). SBI analyzes explicitly the problem schema (e.g., the part-part whole) and the links pertaining to how different elements of the schema are related (e.g., parts make up the whole). Understanding these links is crucial in selecting appropriate operations needed for problem solution. For example, if the whole in the problem is unknown, adding the parts is necessary to solve for the whole; if one of the parts is unknown, subtracting the part(s) from the whole is needed to solve for the unknown part. An important difference between SBI and other instructional approaches is that only SBI emphasizes integrating the various pieces of factual information essential for problem solving. Although factual details are important, they should not be the central focus of instruction and learning. In short, SBI allows students to approach the problem by focusing on the underlying problem structure, thus facilitating conceptual understanding and adequate word-problem-solving skills (Marshall).


We conducted design or classroom experiments concurrently at two different geographical sites. Our research team collaborated with teachers to conduct a series of word problem solving teaching sessions using SBI with a small sample of students to understand what works, prior to conducting a formal experiment (Cobb, Confrey, diSessa, Lehrer, & Schauble, 2003). Design experiments involve in- depth study of an innovation in a small scale to understand not only how responsive an intervention is to the dynamics of the classroom but also to understand “the nature of effective adaptations for students with disabilities” (Gersten et al., 2000, p. 29). Specifically, a design experiment is aimed at improving the connections between teaching and learning through the process of continuous cycles of “design, enactment, analysis, and redesign” of lessons in authentic classrooms (Cobb et al., 2003; The Design- Based Research Collective, 2003). Researchers use the processes of iteration and feedback loops to empirically refine the innovation (e.g., curriculum product, tasks, scaffolds) and design an effective and efficient intervention for use in experimental or quasi- experimental studies (Gersten et al., 2000).

The learning environment in our design experiments included a typical classroom context rather than an exceptional instructional setting (i.e., special rooms outside the regular classroom), wherein the teacher conducted all instruction. The purposes of the design experiments were twofold. First, we investigated the effects of SBI on the acquisition of skills for solving mathematical word problems by third graders and explored the differential effects of SBI for different groups of children (e.g., children with learning disabilities; low, average, and high achievers). Second, we examined the influence of word problem-solving instruction on the acquisition of computational skills given the role that word problems play in the development of number operations (Van de Walle, 2004).

The basic methods were similar across both studies. In Study 1, we addressed the issue of the efficacy of SBI for a homogeneous sample of students from low-ability mathematics classrooms and potential differences between students with learning disabilities and students without disabilities who were low achievers. Study 2 focused on the efficacy of SBI for a heterogeneous sample of third graders, including students differing in initial mathematics grade- level achievement.




Participants included 40 third-grade students who attended an elementary school in a suburban school district in Pennsylvania serving 472 students in Grades K-5. Approximately 15% of the student population was African American, Hispanic, or Asian; 17% were economically dis-advantaged; and 5% were English language learners (ELLs). Third-grade students were grouped according to ability levels (two low- and two high-ability classrooms). Teachers in the two low-ability third-grade classrooms (Classrooms 1 and 2), as well as the special education teacher (Classroom 3), participated to learn about innovative approaches to improving students’ mathematical problem-solving performance. The final sample, however, comprised 38 students (20 boys and 18 girls) because 2 students moved out of the school district before completing the study. The mean chronological age of students was 102.60 months (range = 91 to 119 months; SD = 5.54). Twenty-eight students (74%) were Caucasian, 3 (8%) were African American, 6 (16%) were Hispanic, and 1 was Middle Eastern (3%).

The sample included 9 students (6 boys and 3 girls) with learning disabilities (LD). While 3 of the students with LD received instruction in the general education classrooms, the remaining 6 students were instructed in the learning support classroom. In addition, we designated 9 of the 38 students as low achievers (LA) on the basis of their score (below the 35th percentile) on the Computation and Concepts and Applications subtests of the TerraNova mathematics achievement test (CTB/McGraw-Hill, 2001). Tables 2 and 3 show a summary of participant demographic information by classroom and group status.

Separate one-way analyses of variance (ANOVAs) indicated no significant differences between classrooms on the mathematics subtests of the TerraNova: Concepts and Application, F(2, 35) = 0.38, ns, and Computation, F(2, 35) = 0.80, ns; Total Reading subtest, F(2, 35) = 1.03, ns. However, there was a significant difference between classrooms on chronological age, F(2, 35) = 15.99, p

Similarly, separate one-way ANOVAs indicated no significant between-group (LD and LA) differences on the Concepts and Application subtest of the TerraNova, F(1, 16) = 2.84, ns, and on the Total Reading subtest, F(1, 16) = 1.96, ns. However, there was a significant between-group difference on the Computation subtest, F(1, 16) = 5.53, p

Three female teachers participated in Study 1, each teaching in one classroom. The two general education teachers were certified in elementary education and each had more than 25 years of teaching experience; the special education teacher was certified in special education and had 19 years of teaching experience. Teachers attended 1 hr of inservice training on ways to implement the inter-vention; they received ongoing support from two research assistants (doctoral students in special education) throughout the study.


In all classrooms, teachers taught mathematics five times a week for 50 min with the district-adopted basal text, Heath Mathematics Connections (Manfre, Moser, Lobato, & Morrow, 1994). Teachers taught the word problem-solving unit during regularly scheduled mathematics instruction for 15 weeks. They taught students how to solve one- step addition and subtraction problems involving change, group, and compare problems for 30 min daily, 3 days per week. Instruction was scripted to ensure consistency of information and included an instructional paradigm of teacher modeling with think-alouds, followed by guided practice, paired partner work, independent practice, and homework. In addition, instruction during guided practice emphasized frequent teacher-student exchanges to facilitate problem solving. Word problem-solving instruction involved two phases: (a) problem schema and (b) problem solution. We developed a strategy checklist to help scaffold student learning. The checklist included the following six steps: (a) read and retell the problem to discover the problem type: (b) underline and map important information in the word problem onto the schematic diagram, (c) decide whether to add or subtract to solve the problem, (d) write the mathematics sentence and solve it, (e) write the complete answer, and (f) check the answer. Schema for Problem-Solving Instruction

During the problem-solving phase, students received story situations that did not contain any unknown information. Instruction focused on identifying the problem schema for each of the three problem types (change, group, compare) and on representing the features of the story situation with schematic diagrams (see Figure 1). That is, students learned to interpret and elaborate on the main features of the story situation and map the details of the story onto the schema diagram. Teachers introduced each problem type successively and cumulatively reviewed them to help students discern the three problem types.

Change story situation. Students learned to identify the problem type by using story situations such as, “Jane had four video games. Then her mother gave her three more video games for her birthday. Jane now has seven video games.” Using Step 1 of the strategy checklist, students identified the story situation as change because it initially involved four video games, then an action occurred that increased this quantity by three, which resulted in a total of seven video games. Also, the object identity of the beginning, change, and ending is video games. As such, this story situation is considered a change problem schema. For Step 2, teachers prompted students to use the corresponding diagram (see Figure 1) to organize or represent the information. That step involved identifying and writing the object identity or label (e.g., video games) for the three items of information (i.e., beginning, change, and ending) in the change diagram. Students then read the story to find the quantities associated with the beginning, change, and ending and wrote them in the diagram. Next, students summarized the information in the story with the completed diagram and checked the accuracy of the representation.

Group story situation. Students identified the group story problem type by using story situations such as, “Sixty-eight students at Hillcrest Elementary took part in the school play. There were 22 third graders, 19 fourth graders, and 27 fifth graders in the school play.” Using Step 1 of the strategy checklist, students learned that because the story described a situation in which three small groups (third, fourth-, and fifth-grade students) combine to form a large group (all students in the play), it is a story situation of the group problem type. For Step 2, teachers prompted students to use the group diagram (see Figure 1) to represent the information. Instruction involved identifying the three small groups and the large group and writing the group names in the diagram. Students then read the story to find the quantities associated with each group and wrote them in the diagram. Next, students summarized the information in the story with the completed diagram and checked the accuracy of the representation.

Compare story situation. Students identified the story comparison problem type by using story situations such as, “Joe is 15 years old. He is 8 years older than Jill. Jill is 7 years old.” Using Step 1 of the strategy checklist, students identified the story situation as compare because it required a comparison of Joe’s age to Jill’s age. For Step 2, students used the corresponding diagram (see Figure 1) to organize or represent the information. That diagram involved reading the comparison sentence (He is 8 years older than Jill) to identify the two sets that were compared in the story-determining the identity of the large (Joe’s age) and small (Jill’s age) sets- labeling them in the diagram, and writing the difference amount in the diagram. Students then read the story to find the quantities associated with the two sets and wrote them in the diagram. Next, students summarized the information in the story with the completed diagram and checked the accuracy of the representation.

Problem-Solution Instruction

During the problem solution phase that followed problem schema instruction, students solved problems with unknowns by using either addition or subtraction; SBI in this phase included six steps.

Change problems. Teachers prompted students to identify and represent the problem (i.e., Steps 1 and 2) with the change schematic diagram similar to the problem schema instruction phase. The only difference was that students used a question mark to represent the unknown quantity in the diagram. Step 3 involved selecting the appropriate operation and transforming the information in the diagram into a number sentence. That is, students learned that they needed to add the parts if the whole was unknown and subtract for the part when the whole was known. Instruction emphasized that when the change action caused an increase, the ending quantity represented the whole; when the change action involved a decrease, the beginning quantity was the whole. For Step 4, students had to write the mathematics sentence and solve for the unknown using the operation identified in the previous step. Step 5 prompted students to write a complete answer. Finally, for Step 6, students had to check the reasonableness of their answer and ensure the accuracy of the representation and computation.

Group problems. Students learned to identify and represent the problem (i.e., Steps 1 and 2) with the group schematic diagram as in the problem schema instruction phase. For Step 3, when selecting the operation to solve for the unknown quantity in group problems, students learned that the large group represents the whole and the small groups are the parts that make up the whole. Steps 4 through 6 were identical to those described for the change problem.

Compare problems. Students identified and represented the problem (i.e., Steps 1 and 2) using the compare schematic diagram as in the problem schema instruction phase. For Step 3, when selecting the operation to solve for the unknown quantity in compare problems, students learned that the larger set is the big number or whole, whereas the smaller set and difference are the parts that make up the larger set. Steps 4 through 6 were identical to those described for the change and group problems.

When instructors taught students to solve word problems using SBI, only one type of story situation or word problem with the corresponding schema diagram initially appeared on student worksheets following the instruction of that problem type (e.g., change problem). After students learned how to map the features onto schematic diagrams or solve change and group problems, teachers presented story situations or word problems with both types, along with a discussion of the samenesses and differences between the change and group problems. Later, when students completed instruction for change, group, and compare problem types, teachers distributed worksheets with word problems that included all problem types.

We measured fidelity of treatment with a checklist of critical instructional steps. The checklist included salient instructional features (e.g., providing clear instructions, reading word problems aloud, modeling the strategy application, providing guided practice). Two graduate students in special education collected fidelity data as they observed one of the instructors for approximately 30% of the teaching sessions. Treatment fidelity, estimated as the percentage of steps completed correctly by the instructor, was 93% (range = 85-100%) across the 3 teachers.

Measures and Data Collection

Two research assistants administered and scored the mathematical word problem-solving tests and computation tests with scripted directions and answer keys. All data were collected in a whole- class arrangement.

Word problem-solving criterion referenced test (WPS-CRT). To assess student growth on third-grade addition and subtraction word problems, students completed the WPS-CRT prior to and at the end of the intervention. The test consisted of 25 one-step and two-step addition and subtraction word problems. Part I of the CRT comprised 9 one-step word problems derived from the Test of Mathematical Achievement (Brown, Cronin, & McEntire, 1994). Of the 9 problems, 5 items included distracters. Part II of the WPS-CRT included 16 addition and subtraction word problems selected from five commonly used third-grade mathematics textbooks. The items consisted of 12 one-step and 4 two-step problems that met the semantic criteria for change, group, and compare problem types. In addition, 2 of the problems included distracters. The design of the measure included the three problem types and both operations.

Word problems on Parts I and II of the WPS-CRT required applying simple (e.g., single-digit numbers) to complex computation skills (e.g., three- and four-digit numbers; regrouping). Students had 50 min to complete the test. Directions for administering the word problem-solving test required students to show their complete work and to write the answer and label. Scoring involved assigning one point for the correct number model and one point for correct answer and label for a possible total score of 2 points for each item. Cronbach’s alpha for each of the pretest and posttest measures was 0.84. Interscorer agreement assessed by two research assistants independently scoring 30% of the protocols was .92 at pretreatment and .97 at postreatment.

Word problem-solving fluency (WPS-F) measure. We developed six forms to represent items in Part II of the WPS-CRT but modified them to include fewer problems, less advanced computation (one- and two- digit numbers only), and no problems with distracters to address the timed nature of the task. The problems for each WPS-fluency probe differed with respect to numbers, context, and position of problems, which were random. Given that each probe included only half the number of problems on the WPS-CRT, the probes were not identical and differed with respect to the unknown quantity to be solved (e.g., result, beginning, compared). We covered all possible combinations of problem types and operations in two probes rather than one probe. To control for the difficulty of the probes, we designed odd- and even-numbered probes to be parallel forms. Teachers administered the probes every 3 weeks to monitor students’ progress in solving word problems. Students had 10 min to complete eight problems. For the purpose of this study, we used the combined score of the first two probes and the combined score of the last two probes in the data analysis. Cronbach’s alphas for the two aggregated probes were 0.83 and 0.80, respectively. Interscorer agreement was .93 at pretreatment and .98 at postreatment. Basic mathematics computation fluency measure (Fuchs, Hamlett, & Fuchs, 1998). We monitored student progress toward proficiency on third-grade mathematics computation curriculum prior to and at the completion of the intervention with basic mathematics computation probes. Students had 3 min to complete 25 problems, with a maximum score of 43 correct digits. We scored the performance on the computation probes as the total number of correct digits, which provided credit for correct segments of responses. That assessment system is known to have adequate reliability and validity (see Fuchs, Fuchs, Hamlett, & Allinder, 1989). Interscorer agreement was .99 at pretreatment and 1.00 at postreatment.

Strategy satisfaction questionnaire. We developed and administered a strategy-satisfaction questionnaire following the intervention to provide information about student perceptions regarding the problem-solving strategy intervention (i.e., SBI). The questionnaire included five items that required students to rate whether they (a) enjoyed the strategy, (b) found the diagrams helpful in understanding and solving problems, (c) improved their problem-solving skills, (d) would recommend using the strategy with other students, and (e) would continue to use it to solve word problems in the classroom. Ratings for the Likert-type items on the questionnaire ranged from a high score of 5 (strongly agree) to a low score of 1 (strongly disagree). Cron-bach’s alpha for the questionnaire was 70.

Data Analysis

The unit of analysis was each student’s individual score (N = 16 for the general education classrooms; n = 6 for the special education classroom) rather than the classroom because of sample limitations. On the WPS-CRT, WPS-F, and computation pretest measures, we conducted separate one-way, between-subjects (classroom or group) analysis of variance (ANOVA) to examine initial classroom or group (LD and LA) comparability. On the WPS-CRT, WPS-F, and computation pretest and posttest scores, we conducted a one between- subjects (classroom or group), one within-subjects (time: pretest vs. posttest) ANOVA. We used the Fisher least significance difference (LSD) post-hoc procedure to evaluate any pairwise comparisons for significant effects for the full sample. If a lack of classroom or group comparability on pretreatment measures occurred, we conducted a one-factor ANOVA on the change scores, with classroom or group as the between-subjects factor. (We used change scores for analyzing the two-wave data because their interpretation is straightforward.) In addition, we analyzed scores from the strategy questionnaire with multivariate analysis of variance (MANOVA) with teacher or group as the between-subjects factor. To estimate the practical significance of effects for classroom and group, we computed effect sizes (ESs) by subtracting the difference between the posttest means, then dividing by the pooled standard deviation of the posttest. On the computation measure, we calculated improvement effects by dividing the difference between the improvement means by the pooled standard deviation of the improvement divided by the square root of 2(1-rxy; Glass, McGraw, & Smith, 1981).


Table 2 shows the means and standard deviations for all measures for the full sample; Table 3 shows means and standard deviations for the measures by group (LD and LA).

Pretreatment Differences Among Classrooms on Mathematics Performance

Results indicated that differences between classrooms on the WPS- CRT, F(2), 35) = 1.95, ns, and WPS-F, F(2, 35 = 0.99, ns, were not significant prior to the study.

On the computation pretest, however, there was a significant effect for classrooms, F(2, 35) = 4.87, p Classroom 1 > Classroom 2. However, the difference between Classrooms 1 and 2 was not significant. We found large effect sizes of 1.08 and 1.71 for Classroom 3 when compared with Classrooms 1 and 2, respectively.

Posttreatment Differences Among Classrooms on Mathematics Performance

Results of the repeated-measures ANOVA applied to the pretest and posttest scores demonstrated a significant main effect for time on the WPS-CRT, F(1, 35) = 40.90, p

On the computation measure, results of an ANOVA on change scores indicated a significant effect for classrooms, F(2, 35) = 13.19, p Classroom 2 > Classroom 3.) We found large effects of 1.49 and 1.96 for Classroom 1 when compared with Classrooms 2 and 3, respectively.

Posttreatment Differences Between Classrooms on the Strategy Satisfaction Questionnaire

Results of the MANOVA applied to the Strategy Satisfaction Questionnaire posttreatment scores revealed no significant differences among classrooms, Wilks’s lambda = .69, approximate F(2, 35) = 1.29, ns .

Pretreatment Differences Between Groups (LD and LA) on Mathematics Performance

Results indicated that differences between groups on the WPS- CRT, F(1, 16) = 1.05, ns, and WPS-F, F(1, 16) = 0.36, ns, were not significant prior to the study.

On the computation pretest, however, there was a significant main effect for group, F(1, 16) = 15.09, p

Posttreatment Differences Between Groups on Mathematics Performance LD and LA Sample

Results of repeated-measures ANOVA applied to the pretest and posttest mathematics test scores demonstrated a significant main effect for time on the WPS-CRT, F(1, 16) = 26.94, p

Posttreatment Differences Between Groups on the Strategy Satisfaction Questionnaire

On the strategy satisfaction posttreatment scores, we found significant differences between groups, Wilks’s lambda = .43, approximate F(1, 16) = 3.16, p


Results must be interpreted in light of two serious limitations. First, the full sample (N = 38) and group sample sizes (n = 9 each for LD and LA students) were small. Second, the design was unbalanced; classroom sizes ranged from 16 to 6 students. As such, the findings indicate only preliminary evidence regarding the effectiveness of SBI. Within the constraints of the limitations, results from Study 1 provide evidence that SBI led to improvements in word problem-solving performance for the three classrooms. Although the overall treatment fidelity was high, teaching styles varied concerning teachers’ adherence to the scripted curriculum (i.e., read verbatim or used their own explanations). The high level of treatment fidelity finding suggests that SBI accounted for improved student learning, which is encouraging.

Furthermore, SBI was effective in enhancing the word problem- solving performance of students with LD, whether they received instruction in general education mathematics classrooms or in a special education classroom. When we separated outcomes on the word problem-solving measures for students with LD and their LA peers, the effects of the word problem-solving curriculum on students’ performance was comparable for both groups. Those results are notable because teachers implemented the treatment in a whole-class format and taught students to solve only one-step problems, although students were tested on one-step and two-step problems. The findings support and extend previous research regarding the effectiveness of SBI in solving arithmetic word problems (e.g., Fuchs, Fuchs, Finelli, Courey, & Hamlett, 2004; Jitendra et al., 1998; Zawaiza & Gerber, 1993). Our results indicated that classrooms and groups were not comparable on their computational skills prior to the study and that computational skill improvement varied as a function of classroom and group. In general, students with LD at pretreatment demonstrated better computational skills than did the other students at pretreatment. We expected that result given that many of those students were older students receiving instruction in a special education classroom in which the focus of instruction was on the acquisition of basic skills. However, the general finding that computation improvement was evident for all students was encouraging because opportunities for word problem solving facilitate computation skills. The effect size for posttest over pretest was large for the entire sample (ES = 2.98) and for the LD and LA students (ES = 2.16). Evidently, having students solve story problems enhanced the development of number operations or computational skills. The use of schematic diagrams, which provided meaning to number sentences, had an added value in promoting computation (Van De Walle, 2004).

Finally, the positive evaluation of SBI by students in the study seemed to play a role in enhancing mathematics performance as in several previous investigations (e.g., Case, Harris, & Graham, 1992; Jitendra et al., 1999). Students with learning disabilities especially were more enthused about SBI than were their peers with regard to treatment acceptability and benefits. Wood, Frank, and Wacker (1998) stated that “Student preference is an important factor, because students are not as likely to exhibit effort over time with strategies that they do not like or do not feel are helpful” (p. 336).

Overall, the study helped us learn about effective ways to enhance the problem-solving curriculum as well as facilitate teacher implementation and student learning. Teachers became conversant about modifying the curriculum only when they had completed the problem-solving unit on teaching change and group problems. The lessons that we learned from our observations and teacher input (prior to instruction on compare problems) during SBI implementation allowed us to make several modifications that we organized in the following paragraphs according to curriculum, teacher, and student enhancements.


Although Marshall (1995) discussed the importance of presenting problem schema instruction in concert with the three problem types followed by problem-solution instruction, teachers raised concerns that problem solution was seemingly removed from problem schema instruction for each problem type. Therefore, we revised the curriculum such that instruction for the change problem type began with problem schema instruction, followed by problem solution instruction. We used that same sequence for group and compare problem types.

We redesigned the self-monitoring strategy checklist to include four steps, and used an acronym, FOPS (Find the problem type, Organize information in the problem using the schema diagram, Plan to solve the problem, and Solve the Problem), to help students remember the steps. Furthermore, we elaborated on each step to align with domain-specific knowledge consistent with SBI. For example, to find the problem type (Step 1), we prompted students to examine information pertaining to each set (beginning, change, and ending) in the problem (e.g., change).

Given that the compare problem type was difficult for many students during the problem schema phase, we collaborated with teachers to develop a compare structure that was coherent to third- grade students. For example, we focused on three sets (bigger, smaller, and difference) of information and eliminated reference to difficult terms (e.g., compared, referent) when discussing the problem. In addition, we added oral exercises prior to written work to emphasize the critical features and relations in the problem to ensure that students were familiar with the information needed for problem comprehension, a key aspect of SBI.

We found that teachers were spending considerable amounts of instructional time explaining unfamiliar terms to students who did not have the necessary experiential background, which detracted from the focus on problem solving. Therefore, we modified the textbook problems to meet the needs of the students. For example, unknown words such as “alpaca” were replaced by more familiar terms (sheep).

An important issue that emerged was teacher accountability for student performance on statewide testing. Teachers believed that the use of word problems presented only in text format would not adequately prepare students to generalize to word problems on the state mathematics test. Therefore, revisions to the problem-solving curriculum content entailed inclusion of items that presented information in tables, graphs, and pictographs. Also, because the state test required the use of mathematical vocabulary (e.g., addend) and emphasized written communication, we modified our instruction to include key mathematical terms and provided students with practice in writing explanations for how the problem was solved.

Furthermore, teachers noted that the time required to implement SBI was unrealistic given the need to cover other topics in the school curriculum. In the redesign of the problem-solving curriculum, we reduced problem schema instruction for each problem type from three 30-min lessons to one 50-min lesson and developed four lessons for each problem type that addressed problem solution instruction. In addition, we incorporated fading of schematic diagrams to ensure that students were able to apply learned solution procedures independent of diagrams. Also, we eliminated homework problems from the curriculum because of teacher concerns regarding students’ inconsistent completion of homework.


Our observations revealed that teachers need ongoing support during the initial implementation of a newly developed intervention. Although we provided teaching scripts to ensure consistency in implementing the critical content, we suggested that teachers use them as a framework for instructional implementation. However, our observations indicated that Teacher 1 read the script verbatim and followed it in its entirety, whereas Teacher 2 followed the script inconsistently and required reminders to adhere to the relevant information needed to promote problem solving. In contrast, Teacher 3 (special education teacher) familiarized herself with the script and used her own explanations and elaborations to implement the intervention with ease. Obviously, general education teachers in this study needed more support to implement the intervention. Also, we noticed that some students in one classroom were struggling during partner work and had to wait until whole-class discussion to get corrective feedback, indicating the need for providing teachers with explicit guidelines about direct monitoring student work.

In addition, we found that the general education teachers seemed more at ease in communicating their concerns to the two research assistants who supported them during the course of the project than to the primary researcher, which raised the issue of how and with whom communication should be facilitated.


Our observations indicated that for many students, especially those with LD, scaffolding instruction (modeling, use of schematic diagrams and checklists) was critical as they learned to apply the strategy. Explicit modeling and explanations using several examples enhanced their problem-solving skills. Following teacher-led instruction, we noticed that several students with LD, as well as low-achieving students, were enthused and participated actively, as evidenced by their raised hands in response to teacher questioning.




Students in Study 2 were 56 third-grade students in two heterogeneous classrooms in a parochial school located in a small city in Florida that served 570 students in Grades pre-K-8. Approximately 20% of the student population was African American, Hispanic, or Asian. The total sample included 27 boys and 29 girls. The mean chronological age of the students was 108 months (range = 96 to 113 months). Forty-four of the students (78%) were Caucasian, 5 (8%) were African American, 6 (10%) were Hispanic, and 1 was Asian American (2%). Although not formally identified at the school, the two third-grade classroom teachers reported that 9 of the 56 third- grade students had either LD, attention deficit disorder, or were considered low achievers. On the basis of scores from the Problem Solving and Data Interpretation (PSDI) mathematics sub-test of the Iowa Test of Basic Skills (ITBS), we designated each student’s initial mathematics achievement status as low performing (LO; below the 34th percentile), average performing (AV; between the 35th and 66th percentiles), or high performing (HI; above the 66th percentile). Tables 4 and 5 summarize participant demographic information by the two classrooms and by student type (LO, AV, and HI students).

Separate one-way analysis of variances (ANOVAs) indicated no significant differences between classrooms on the mathematics and reading subtests of the ITBS, Problem Solving and Data Interpretation, F(1, 54) = 0.11, ns; Computation, F(1, 54) = 0.04, ns; Vocabulary, F(1, 54) = 0.35, ns; and Comprehension, F(1, 54) = 0.01, ns. In addition, there was a lack of a significant difference between classrooms on chronological age, F(1, 54) = 0.31, ns. Chi- square analyses revealed no significant between-classroom differences on gender, chi^sup 2^(1, N = 56) = .27, ns, and ethnicity, chi^sup 2^(3, N = 56) = 1.96, ns.

Similarly, separate one-way ANOVAs indicated significant differences between student type (i.e., LO, AV, and HI) on the PSDI, F(2, 53) = 239.81, p AV students > LO students.) In addition, there was no significant between-student type difference on chronological age, F(2, 53) = 0.95, ns. Chi- square analyses revealed no significant between-student type differences on gender, chi^sup 2^(2, N = 56) = 0.27, ns, or race chi^sup 2^(6, N = 56) = 1.96, ns . The two general education teachers who participated in Study 2 were women; one teacher held a master’s degree with 10 years’ teaching experience, and the other teacher, a bachelor’s degree with 30 years’ experience. The teachers attended 1 hr of inservice training on ways to implement the intervention; they received support from one of the researchers periodically throughout the study.

Procedures and Measures

Procedures and measures were similar to those used in Study 1. In this section, we report only how components of the two studies differed. First, although students received the same amount of instructional time as those in Study 1, teachers began Study 2 later in the school year. Second, teachers in Study 2 used the textbook series, “Mathematics-The Path to Math Success! Grade 3″ by Silver, Burdett, and Ginn (as cited in Fennell et al., 1998) for daily mathematics instruction. Third, instructional procedures differed from Study 1 with regard to compare instruction. On the basis of student difficulty with the compare problem type, teachers in Study 1 used the revised compare diagram and instruction. However, teachers in Study 2 decided to implement the original compare instruction and determine how their students, who were predominately average and high performing, would respond. Fourth, in Study 2, we did not use the student strategy satisfaction questionnaire administered in Study 1 because of time constraints at the end of the school year. Treatment fidelity was 97.5% (range = 91-100%) across the two teachers.


Table 4 shows the means and standard deviations for all measures for the full sample. Table 5 displays means and standard deviations for the measures by group (LD and LA).

Pretreatment Differences Between Classrooms on Mathematics Performance

Results indicated that differences between classrooms on the WPS- CRT, F(1, 54) = 3.35, ns, WPS-F, F(1, 54) = 0.00, ns, and computation pretest scores, F(1, 54) = 0.92 ns, were not significant prior to the study.

Posttreatment Differences Between Classrooms on Mathematics Performance

Results of ANOVA applied to the pretest and posttest data yielded no significant classroom by time of testing interaction for WPS- CRT, F(1, 54) = 1.39, ns, indicating that classroom had no effect on changes in correct responses from pretest to posttest.

However, the analyses yielded a significant main effect for time of testing for WPS-F, F(1, 54) = 10.93, p

Pretreatment Differences Between Groups (LO, AV, and HI) on Mathematics Performance

Results indicated that, as expected, differences between groups on the WPS-CRT, F(2, 53) = 5.68, p

LSD follow-up tests indicated that mean scores for HI were significantly different than were those for AV (p AV > LO). However, the difference between LO and AV was not significant. We found large effect sizes of 1.14 and 1.15 for HI when compared with AV and LO, respectively. For the WPS-F measure, follow-up tests indicated that the mean scores for HI were significantly different than those for AV (p AV > LO). In addition, the difference between LO and AV was significant (p

Finally, follow-up tests on the computation measure revealed that mean scores for HI were significantly different than were those for LO (p LO), but not between HI and AV (p = .15). The difference between LO and AV on the computation measure was also significant (p = .05); the AV group scored higher. We found large effect sizes of 1.67 and 0.91 for HI compared with the LO, and for AV compared with LO, respectively.

Results of ANOVA applied to the WPS-CRT pretest and posttest data yielded no significant group by time of testing interaction, F(2, 53) = 1.19, ns. That finding indicated that group status (HI, AV, LO) had no effect on changes in correct scores from pretest to posttest. In addition, there was no significant main effect for time, F(1, 53) = 0.26, ns, indicating lack of improvement from pretest to posttest for the entire sample of students, regardless of group status. However, the analysis yielded a significant main effect for group, F(2, 53) = 12.45, p

Results for WPS-F indicated the following significant effects: group by time of testing interaction, F(2, 53) = 3.19, p

The analysis of computation data yielded no significant group by time of testing interaction, F(1, 54) = 0.36, ns, indicating that group status (HI, AV, LO) had no effect on changes in correct scores from pretest to posttest. However, results indicated a significant main effect for time, F(1, 53) = 18.84, p


Students in Study 2 made small-to-moderate improvements in their word problem-solving and computation performance despite unforeseen problems that occurred during this field-based research. The benefits of a SBI were particularly apparent for the low-performing students in the two heterogeneously grouped third-grade classrooms. As in Study 1, third-grade teachers delivered instruction in whole- class arrangement without special instructional adaptations for low- performing students. Despite the lack of individualized instruction, those students showed the greatest amount of growth from pretest to posttest on two of the three measures. Conversely, the results indicated that the high-performing students made the least amount of improvement. The findings support previous studies regarding the importance of explicit instruction for struggling learners (e.g., Swanson, 1999) and its triviality for high performers. Even young children with high intelligence are capable of using strategies spontaneously to perform given tasks without explicit instruction (Cho & Ahn, 2003).

Results for the WPS-CRT, in particular, were disappointing. Our hypothesis that there would be a significant effect for time from pretest to the posttest was not realized. The lack of improvement over time may be attributed to several reasons. First, the length of the WPS-CRT task (i.e., 25 one-step and two-step word problems) in conjunction with administration of the posttest only 2 weeks before school ended for the summer may not have been motivating for students, in spite of explicit instructions to do their best. Teachers reported students’ less-than-determined attempts to complete the test. As such, that problem could have negatively affected student performance on the WPS-CRT posttest, which is not uncommon to intervention research (Dunst & Trivette, 1994).

Another plausible explanation for a lack of improvement was that the high-performing students may have known how to solve third- grade word problems before the study began and did not require extensive strategy instruction. A closer look at the data suggests that that may be the case. Fifteen of the high-performing students scored close to an accepted criterion level of 80% correct before the study began, indicating possible ceiling effects (M = 39.91, SD = 3.99; range = 34 to 47 on the 50-point WPS-CRT). As such, pretest to posttest improvement was not evident for the students, and ceiling effects limited the ability to ferret out differences among higher scoring students. Results on the WPS-F measures were more encouraging. All students improved their word problem-solving performance on the shorter, 8-item, probes from pretest to posttest. The LO group made the most progress on this measure, as evidenced by the large effect size of 0.79 when compared with effect sizes of 0.34 and 0.20 for the AV and HI groups, respectively. Given that the measure included easier items than did the WPS-CRT, was shorter in length, and required only 10 min to complete each probe, thereby reducing fatigue, it may have positively influenced the results of the WPS-F.

The pretest-to-posttest results for the computation measure revealed a pattern similar to those for the WPS-F. All groups improved significantly over time; however, the LO group demonstrated the largest gains from pretest to posttest, with a medium effect size of .57 when compared with small-to-moderate effect sizes of .36 and .32 for the AV and HI students, respectively. Again, word problem solving seems to play a role in the development of number operations, such as computational skills (Van de Walle, 2004), particularly for low-performing students.

As in Study 1, we learned ways to enhance curriculum, instruction, and measures. For example, whereas teachers from both sites were concerned with the length of the intervention, Study 2 teachers were less concerned about coverage of other topics. Instead, teachers from Study 2 reported the need to reduce what they considered redundancy in the curriculum as well as highly directive instruction because the majority of their students (i.e., average and high performers) occasionally appeared unmotivated during the instructional sessions. Considerable evidence shows that pacing curriculum and instruction to match the needs of the student is one way to ensure that the needs of highly able students are addressed (Rogers, 2002). Attempts to replicate SBI in heterogeneous classrooms may necessitate changes to differentiate the instruction for mixed-ability groupings (e.g., Tomlinson, 1995).

Our observations in Study 2, like those in Study 1, indicated that LO students benefited from the explicit modeling, guided practice, use of schematic diagrams, and checklists that are characterized by SBI. We observed that LO students tended to participate more than was typical during SBI instruction, and their teachers stated that instruction had clear benefits for them. However, as a group, the performance of LO students on the three measures was inconsistent and highly variable. They performed well on the WPS-F and computation measure but not on the WPS-CRT. On the WPS-CRT (pretest: M = 22.00, SD = 11.18; posttest: M = 19.89, SD = 12.97), performance of LO students was highly variable, as indicated by large standard deviations. That result did not occur for the WPS- F (pretest: M = 12.06, SD = 4.66; posttest: M = 17.67, SD = 8.91) or computation test (pretest: M = 27.00, SD = 4.06; posttest: M = 29.44, SD = 4.50). The length and difficulty of the WPS-CRT appeared to affect the performance of the LO group most directly; variability decreased when the measures were shorter or less challenging.

Finally, although the two teachers who participated in Study 2 reported that they had few problems implementing SBI instruction and had high levels of treatment fidelity (M = 98%), in hindsight, the teachers may have had more to reveal had they received our consistent support. Providing teachers with more frequent opportunities to converse with us may have allowed for instructional adaptations that addressed varying student needs. Instruction that meets the needs of heterogeneous groups of students is not easily designed, and, consequently, may require more structured, collaborative efforts between teachers and researchers.

General Discussion

Design studies have the potential to allow researchers to develop in-depth understandings of teaching and learning in classroom environments, that in turn, can assist them in the design of experimental or quasi-experimental studies (Gersten et al., 2000). Specific examples of those benefits include enhanced knowledge about (a) instructional design features of an intervention, (b) student and teacher responses to the intervention, (c) measures used, and (d) the role of the researcher in facilitating the study. Like pilot studies, design experiments allow researchers to determine the feasibility of an experiment before implementing it on a larger scale and under more carefully controlled conditions (Gersten, in press).

We undoubtedly learned much about SBI for promoting mathematics word problem solving by conducting the two design studies reported here. Although we may have a few unanswered questions, neither the knowledge gained nor the uncertainties revealed would have occurred without carrying out the investigative work of these studies. The most notable insights revealed to us from conversations with the teachers and from our own observations included the benefits of strategy fading to promote independent student use of SBI. Encouraging independent, self-regulated use of strategies has been supported in the reading literature for years and is perceived as an important long-term goal of strategy instruction (Pressley et al., 1992). We suggest that supporting the independent use of strategies is an important aspect of mathematics word problem-solving instruction as well.

We also realized the need for changes in the c

comments powered by Disqus