From Research to Practice: Promoting Academic Competence for Underserved Students
By Shapiro, Edward S
Over many decades, the efforts of researchers to understand key issues in the reading performance of those children most at risk for developing later reading problems has been relentless (e.g., Gersten & Dimino, 2006; Snow, Burns, & Griffin, 1998). During the past several years, since the National Reading Panel released its findings and recommendations (National Institute of Child Health and Human Development [NTCHD], 2000) and the No Child Left Behind legislation established lofty goals for student achievement, federally funded programs to improve reading performance through Reading First and Early Reading First have been initiated. As a result, many efforts in the research literature have focused on understanding assessment processes that can identify at the earliest ages those children whose paths to academic success in reading and language development must be altered to avoid long-term failure in learning to read. The three studies related to the special topic of this issue are consistent with these themes and move the literature significantly forward toward a fuller understanding of how we can best identify children at young ages whose difficulties in reading and language development may be leading to problematic outcomes. Consistent with my own views, these studies attack the “big problems” in education and school psychology (Shapiro, 2000). Although these studies indeed move the field forward, two use statistical methodologies that can be somewhat complex for practitioners to fully understand. Developments in data analysis procedures within the past decade have provided new tools for researchers that help in analyzing the real-life data that can be collected in school settings. Any of us who conduct research in schools know how “messy” the data collection process can get. In particular, when we conduct longitudinal research, we face the dilemma of dealing with the natural attrition that invariably arises with this type of design. At the same time, our statistical methodologies have advanced to the level that, at least statistically, we can overcome some of these hurdles that in the past may have made our data uninterruptible (and probably unacceptable to reviewers). Unfortunately, when we use these more sophisticated statistical methodologies, the interpretation of outcomes can become difficult for the typical practitioner to fully understand. I hope in this commentary to take the somewhat sophisticated findings of these studies, in particular the Baker et al. (2008) study, and provide a broad, meaningful framework for the readers.
Both Vanderwood, Linklater, and Healy (2008) and Edl, Jones, and Estell (2008) examine issues in an underserved population for which much more research is needed. As pointed out by these authors, the population of English language learners (ELLs) in United States schools rose from 5.1% in 1993-1994 to 6.7% in 1999-2000, growth that translates to more than 920,000 students in a 6-year period. Over 80% of these students are of Hispanic backgrounds in which Spanish is their native and first language (McCardle, Mele-McCarty, Cutting, Leos, & D’Emilio, 2005). Donovan and Cross (2002) note that in the past decade the number of ELLs has increased nearly 70% to 5.5 million. Clearly, it is critical to understand factors that can help identify as early as possible ELL students whose reading performance may be at risk for developing later problems in reading. Likewise, attaining a full and deep understanding of how ELL students are perceived by the teaching staff in schools is also crucial to working effectively with these children.
Vanderwood et al. (2008) examined the longitudinal predictive validity and diagnostic accuracy of the nonsense word fluency (NWF) measure for a population of ELL students from California. In particular, they looked at the predictability of end of first-grade performance on NWF to outcomes at the end of third grade on two forms of curriculum-based assessment measures (oral reading fluency [ORF] and maze) as well as on the statewide achievement test (California Achievement Test-6th Edition). The issue of which measures are best predictors for students who are ELLs is an important and critical concern. In particular, there is a substantial gap in the literature related to how to judge the performance of ELL students within the developing response to intervention (RTI) methodology of service delivery. Readers with interest in this area are strongly urged to see the summer 2007 issue of Learning Disability Quarterly, an issue fully devoted to that single topic (Haager, Linan-Thompson, & Calhoon, 2007).
The Vanderwood et al. (2008) study introduces the topic by claiming that “researchers have clearly demonstrated the predictive ability of . . . NWF, phoneme segmentation fluency . . ., and rapid automatized naming … for reading performance among native English speaking students (Torgesen, Wagner, & Rashotte, 1994)” (pp. 5-6). Although there are certainly studies that have found NWF to be predictive of later reading development, there was an important finding reported by Fuchs, Fuchs, and Compton (2004) that raised some serious questions about the NWF measure relative to another early literacy measure, word identification fluency. This particular issue will be addressed in more detail again in commenting about the Baker et al. (2008) article.
One of the really important and positive aspects of the Vanderwood et al. (2008) article is a recognition that although correlational findings related to outcomes are important, correlations alone cannot tell the whole story when it comes to looking at the predictive validity of these measures. Many practitioners falsely believe that if two measures are highly correlated, one can use the correlation to be assured that the predictive accuracy of the measure will also be strong. Each of these statistical processes addresses a different question and measures that are strongly correlated may or may not be effective at diagnostic accuracy.
Similar to almost all longitudinal data analyses, attrition of the sample played a large role in the Vanderwood et al. (2008) study. In this study, almost 50% of the students completing first grade were no longer at the same school and in the sample by the end of third grade. Given that geographic stability is a known protective factor against risk, the sample remaining in the school over the 3-year period probably was not fully representative of the sample present at the end of first grade, an obvious limitation of the study. However, the strength of the study was that of the 150 students remaining in Grade 3, 134 were classified as ELL, offering a large, stable population with which to examine outcomes. Likewise, the majority (89%) of the students who were ELL had a native language of Spanish.
Although the students in this particular study certainly represented a high percentage of ELLs, it is important to note that the mean level of performance on NWF at the end of Grade 1 exceeded the benchmarks identified by the Dynamic Indictors of Basic Early Literacy Skills (DIBELS) as low risk by a substantial amount (benchmark = 50 sounds per minute; students in the study achieved 70.59 sounds per minute). Indeed, the AIMSweb normative data for over 129,000 students nationwide on NWF would place this score between the 50th and 75th percentile (AIMSweb, 2004). Clearly, a large proportion of students in the Vanderwood et al. (2008) study had ended Grade 1 well above the expected level of performance on NWF; thus, their school may not be representative of schools with a high percentage of lower performing ELL students.
Looking at basic correlations between end of Grade 1 NWF and Grade 3 ORF, Vanderwood et al. (2008) found much higher correlations between the curriculum-based measurements (CBMs; NWF to Grade 3 ORF and Maze), but much lower correlations between NWF and standardized achievement tests both at the end of Grade 1 (Stanford Achievement Test-Ninth Edition) and the end of Grade 3 (California Achievement Test-6th Edition). The lower correlations to standardized achievement tests may be somewhat problematic in that many educational professionals continue to question the relationship between CBM measures and overall measures of reading performance that include evaluations of broader skills in reading. Although statistically significant, the correlations were rather small and offer some support for concerns that the NWF measure may not be very predictive of outcomes on what are typically considered more high- stakes statewide achievement tests. The correlation between third- grade ORF and the third-grade standardized test (.60) is more consistent with what has been reported in many other studies that have examined such relationships between CBMs and statewide achievement tests at third grade (e.g., McGlinchey & Hixson, 2004; Schilling, Carlisle, Scott, & Zeng, 2007; Shapiro, Keller, Lutz, Santoro, & Hintze, 2006).
Beyond the basic correlational analysis, Vanderwood et al. (2008) conducted both a regression analysis to examine the degree of variance explained by the ORF measure beyond the particular level of ELL students, and an examination of predictive accuracy using NWF scores derived at the end of Grade 1 to predict risk levels at the end of Grade 3. Across analyses, their findings were similar. Although the ELL level of students certainly accounted for the majority of the variance in predicting end of Grade 3 scores, NWF at the end of Grade 1 consistently added a small but significant portion of variance. No other measure was significant in the predictive model. A particularly interesting part of their findings was that growth over time in NWF did not significantly contribute to end of third grade outcomes, even though the end of first grade score on NWF did add to the explained variance. In conducting the analyses of predictive accuracy, Vanderwood et al. (2008) used the risk level defined by DIBELS for NWF (i.e., 50 sounds per minute) to dichotomize the sample into “risk” and “not at risk” categories. However, on the outcome variable they set the criterion (“at or above expectations”) at the 25th percentile. It was not clear why this level was set as the criterion for outcome at the end of third grade, but it is important to point out that the criterion recommended by the DIBELS is higher (above the 50th percentile on the 2006-2007 AIMSweb aggregate normative database of over 50,000 students). Setting the criterion at the 25th percentile for the outcome variable may be somewhat lower than the level used on many statewide achievement tests to indicate proficiency.
The outcomes of the analyses conducted by Vanderwood et al. (2008) consistently showed acceptable levels of specificity but problems with sensitivity, which were substantial. Whereas NWF as measured in Grade 1 generally was accurate in predicting relatively good outcomes in Grade 3, the NWF measure failed to predict a relatively large number of children who performed poorly in Grade 3. Had these children been in an RTI model of service delivery, they might not have qualified for additional support if the NWF measure alone had been used for diagnostic purposes. Given that Vanderwood et al. report that the large majority of students who fell into this category were those at the lowest ELL levels, the study demonstrates that NWF alone may be problematic as a predictor for the lowest ELL students. Clearly, one implication here is that the use of a single metric such as NWF may not be sufficient for accurate diagnostic decision making.
Overall, the Vanderwood et al. (2008) study is important for its focus on ELL students and its demonstration of the strength of CBMs in identifying those who develop later reading difficulties. The study certainly offers important information about the effect of using CBM measures with ELL students, and can provide some guidance to educators who wonder about the validity of using CBMs with the ELL population.
The study by Edi et al. (2008) also examined issues and concerns related to students with ELL backgrounds. In this study, the researchers used a rating scale (Interpersonal Competence Scale- Teacher) to examine teacher perceptions of three groups of students: students of Hispanic background who were in bilingual classrooms, Hispanic students in regular classrooms, and White, nonHispanic students in regular classrooms. The data were collected over a 2- year period, so a longitudinal perspective was possible. Sample attrition, of course, was a factor to be addressed in the analyses, as the researchers examined whether the perceptions of teachers change over the fourth- to fifth-grade period.
Using a set of discriminant function analyses, Edi et al. (2008) found that popularity, academic competence, and Olympian-like traits during fall and spring of fourth grade were best able to separate the groups. When additional analyses were performed, the outcomes showed that students from Hispanic backgrounds in bilingual classrooms were consistently rated lower than other groups. Although there were some shifts in the specific traits perceived by teachers to be most salient during fifth grade, students in bilingual classrooms were once again rated lower than other groups.
A particular concern not discussed in the research report was the nature of the bilingual classrooms. Bilingual instruction can vary greatly and approaches to bilingual education can be quite distinct (Ochoa, 2005). The degree to which the particular kind of bilingual instruction being used in this school had an influence on the perceptions of teachers was not indicated by the authors. At the same time, the Edl et al. (2008) study certainly points to a growing concern that students in bilingual classrooms may be viewed as less competent and popular compared to peers. An important cautionary note, which is indicated by the authors, is that students from Hispanic backgrounds cannot and should not be considered as a homogeneous group. The cultural and linguistic heritage of students who come from Hispanic backgrounds varies greatly among those from the Caribbean, Mexico, Latin America, South America, or other Hispanic countries. Drawing conclusions about the population from a study conducted in a midwestern part of the United States can be problematic.
Baker et al. (2008) added important information related to the accuracy of using CBM reading performance in the young grades to predict later outcomes in school. In particular, their study focused on schools from the Oregon Reading First program, which by definition would have been based in low-performing and high-poverty environments. Their study examined both the performance level as well as growth (slope) of ORF on standardized achievement measures obtained 1 year later using cohorts of students in Grades 1-3. As mentioned earlier in commenting about the Vanderwood et al. (2008) study, Baker et al. (2008) raised an interesting issue by questioning the Fuchs et al. (2004) method of accounting for slope, noting that “the effect of slope is difficult to interpret in a model without the intercept… ” (p. 22). Fuchs et al. (2004) indicated that they conducted a dominance analysis (Azen & Budescu, 2003; Schatschneider, Francis, Fletcher, & Foorman, 2004), which is a “pairwise comparison of all predictors (fall nonsense word fluency level, full year nonsense word fluency slope, fall year word identification level, and full year word identification slope) as they relate to the spring criterion (i.e., Woodcock Word Identification, CRAB [Comprehensive Reading Assessment Battery, inserted by author] fluency, CRAB comprehension)” (p. 15). Although there may be some disagreement among statisticians, Fuchs et al. (2004) viewed this type of procedure as sufficient to account for the intercept in thenanalyses.
Baker et al. (2008) employed a series of longitudinal data analyses with a large, diverse data set. Their analyses examined students across cohorts, some of which were collected over 2 years, and some only in a single year (i.e., the students were in third grade in the first year and no longer part of Reading First as they moved to fourth grade). In their study, they examined the outcomes on the DIBELS ORF measure as well as the Stanford Achievement Test- Tenth Edition (SAT-IO) given at the end of Grade 1 and Grade 2. The Oregon Statewide Reading Assessment was administered at the end of Grade 3. The inclusion of findings pertaining to comparisons between ORF and standardized tests in Grades 1 and 2 is a particularly positive aspect of their investigation, as few studies have reported on the relationship of ORF to standardized achievement tests in these early grades.
The data analysis procedures used by Baker et al. (2008) were very sophisticated and consistent with current statistical techniques used in longitudinal data analysis. Growth curve analyses as well as the development of prediction models were used to show that growth in reading performance reflected in ORF does not follow a clear linear trend. Students gain the most in the early grades up through Grade 2, with lesser growth demonstrated by the middle to end of Grade 3.
As stated earlier when commenting about the Vanderwood et al. (2008) study, it is essential to go beyond the basics of correlation to understand important issues in predicting accuracy. The correlations of ORF with the SAT-10 through Grades 1 and 2 were strong and generally ranged from .72 to .82. Likewise, correlations of ORF across grade levels with the Oregon Statewide Reading Assessment were in the range of .58-.68.
The more complex findings examined longitudinal growth models across grades and ORF assessment measures. They examined the abitity of the ORF intercept (where students started at the beginning of the year) and the ORF slope to predict SAT-10 outcomes at the end of Grade 2 and the Oregon Statewide Reading Assessment at Grade 3. Baker et al. (2008) found that slope added to the accuracy of predicting performance on the standardized achievement test in the second year over and above the contribution of performance (intercept). The amount of added variance beyond the level where the student began (intercept) was reported to be 10% for Grade 2 on the SAT-10 and 3% for the Oregon Statewide Reading Assessment at Grade 3. Baker et al. (2008) questioned the value of yearly administrations of a standardized measure such as the SAT-10, given that the ORF level and slope accounted for 70% of the variance in Grade 2 and given that standardized achievement testing accounted for only 6% additional variance. Baker et al. (2008) argued that the combination of ORF level and slope offer a strong package in predicting outcomes in subsequent grade levels. I echo their concern about yearly administration of standardized tests and strongly support their point about the predictive value of CBM assessment.
Taken together, these three articles represent a very strong contribution to the literature. Each emphasizes, from a slightly different perspective, key points of concern related to promoting academic competence among underserved, at-risk students. One particular key point is the long-term predictability and importance of measures obtained early in a child’s school career. The articles reinforce the strongly held view that the earlier we identify and attack the learning trajectories of these students, the greater the likelihood of reducing future risk. Further, among students at risk for developing academic problems, those who are ELLs may struggle to overcome an additional risk factor-that is, the stigma associated with low teacher expectations of academic competence. References
AIMSweb. (2004). AIMSweb growth tables. Retrieved December 24, 2007, from http://www.aimsweb.com
Azen, R. & Budescu, D. V. (2003). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 8, 142-151.
Baker, S. K., Smolkowski, K., Katz, R, Fien, H., seeley, J. R., Kame’enui, E. K., et al. (2008). Reading fluency as a predictor of reading proficiency in low-performing, high-poverty schools. School Psychology Review, 37, 18-37.
Donovan, M. S., & Cross, C. T. (2002). Minority students in special and gifted education. Washington, DC: National Academy Press.
Edl, H. M., Jones, M. H., & Estell, D. B. (2008). Ethnicity and English proficiency: Teacher perceptions of academic and interpersonal competence in European American and Latino students. School Psychology Review, 37, 38-45.
Fuchs, L. S., Fuchs, D., & Compton, D. L. (2004). Monitoring early reading development in first grade: Word identification fluency versus nonsense word fluency. Exceptional Children, 71, 7- 21.
Gersten, R., & Dimino, J. A. (2006). Rti (response to intervention): Rethinking special education for students with reading difficulties (yet again). Reading Research Quarterly, 41, 99- 108.
Haager, D., Linen-Thompson, S., & Calhoon, M. B. (2007). English language learners and response to intervention: Introduction to special issue. Learning Disability Quarterly, 30, 151-152.
McCardle, P., Mele-McCarty, J., Cutting, L., Leos, K., & D’Emilio, T. (2005). Learning disabilities in English language learners: Identifying the issues. Learning Disabilities Research & Practice, 20(1), 1-5.
McGlinchey, M. T., & Hixson, M. D. (2004). Contemporary research on curriculum-based measurement: Using curriculum-based measurement to predict performance on state assessment in reading. School Psychology Review, 33(2), 193-204.
National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769). Washington, DC: U.S. Government Printing Office.
Ochoa, S. H. (2005). The effectiveness of bilingual education programs in the United States: A review of the empirical literature. In C. L. Frisby & C. R. Reynolds (Eds.), Comprehensive handbook of multicultural school psychology (pp. 329-356). New York: John Wiley & Sons.
Schatschneider, C, Francis, D. J., Fletcher, J. M., & Fcorman, B. (2004). Kindergarten prediction of reading skills: A longitudinal comparison. Journal of Educational Psychology, 96, 265-282.
Schilling, S., Carlisle, J. F., Scott, S. E., & Zeng, J. (2007). Are fluency measures accurate predictors of reading achievement? The Elementary School Journal, 107(5), 429-448.
Shapiro, E. S. (2000). School psychology from an instructional perspective: Solving big, not little problems. School Psychology Review, 29(A), 560-572.
Shapiro, E. S., Keller, M. A., Lutz, J. G., Santoro, L. E., & Hintze, J. M. (2006). Curriculum-based measures and performance on state assessment and standardized tests: Reading and math performance in Pennsylvania. Journal of Psychoeducational Assessment, 24(1), 19-35.
Snow, C. E., Burns, M. S., & Griffin, P. (Eds.). (1998). Preventing reading difficulties in young children. Washington, DC: National Academy Press.
Torgesen, J., K., Wagner, R. K, & Rashotte, C. A. (1994). Longitudinal studies of phonological processing and reading. Journal of Learning Disabilities, 27, 276-286.
Vanderwood, M. L., Kinklater, D., & Healy, K. (2008). Predictive accuracy of nonsense word fluency for English language learners. School Psychology Review, 37, 5-17.
Date Received: December 24, 2007
Date Accepted: January 2, 2008
Action Editor: Thomas Power
Edward S. Shapiro
Center for Promoting Research to Practice, Lehigh University
Correspondence regarding this article should be addressed to Edward S. Shapiro, Center for Promoting Research to Practice, Lehigh University, L-111 Iacocca Hall, 111 Research Drive, Bethlehem, PA 18015; E-mail: ed.shapiro@lehigh.edu
Edward S. Shapiro is the Director of the Center for Promoting Research to Practice and Professor of School Psychology at Lehigh University. His primary research interests are in the area of assessment and intervention of academic skills problems. Currently, he is serving as a consultant to the Pennsylvania Initiative for Response to Intervention.
Copyright National Association of School Psychologists Mar 2008
(c) 2008 School Psychology Review. Provided by ProQuest Information and Learning. All rights Reserved.
