April 19, 2007
How Do Different Types of High School Students Respond to Schoolwide Positive Behavior Support Programs? Characteristics and Responsiveness of Teacher-Identified Students
By Lane, Kathleen Lynne; Wehby, Joseph H; Robertson, E Jemma; Rogers, Leslie Ann
In this article, the authors examined (a) the accuracy of teacher nominations in identifying (N = 178) high school students with externalizing, internalizing, comorbid, and typical behavior patterns, as well as students who were receiving special education services for high-incidence disabilities, (b) the level of treatment fidelity and access to reinforcement for the different student groups, and (c) the degree to which these different types of students responded to a schoolwide positive behavior support (SW- PBS) intervention program. Results indicated that despite receiving equal access to reinforcement, there were subtle differences regarding how different types of high school students responded to the SW-PBS. It appears that students with internalizing behavior problems were the most responsive, whereas students with comorbid concerns were the least responsive. Limitations and directions for future research are offered.
In recent years, schools have shifted from a reactive approach involving strong consequent-based components (e.g., detentions, suspensions, expulsion for rule infractions) to a proactive approach containing strong antecedent-based components designed to (a) clarify expectations for faculty members, (b) teach these expectations to all students, (c) afford students opportunities to practice expectations, and (d) reinforce students whose performance meets or exceeds the stated expectations (Horner & Sugai, 2000; Lane, Robertson, & Graham-Bailey, 2006; Lewis & Sugai, 1999; Shapiro, Burgoon, Welker, & Clough, 2002). This shift in orientation has been manifested within the context of a three-tiered, data- driven model comprised of primary, secondary, and tertiary levels of prevention. This model provides a systematic approach to preventing the development of new behavioral problems, while providing the necessary level of support to manage existing behavioral concerns.
The universal-or primary-level of prevention includes schoolwide interventions, such as violence prevention, conflict resolution, and social skill programs. All students participate in this level of prevention just by virtue of attending school. The intent is to prevent problems from occurring by supporting a larger number of students who demonstrate minimal levels of risk. According to Horner and Sugai (2000), approximately 80% of the student body should respond to this level of prevention. Schoolwide data are used to monitor student progress and identify students in need of more intensive, secondary prevention efforts.
Secondary prevention efforts involve more focused intervention programs for students with common acquisition, fluency, or performance deficits (Elliott & Gresham, 1991). This level may include self-regulation skills, conflict-resolution skills, study skills, or supplemental academic supports. Students are identified through procedures used in response to intervention (RTI) models (Fuchs, Fuchs, & Compton, 2004). Namely, more global assessments, such as schoolwide behavioral screeners, office discipline referrals, and even attendance data, are used in methods similar to curriculum-based measures of academic performance to identify students for secondary, or even tertiary, levels of prevention. Experts in the field anticipate that 10% to 15% of the student body will require secondary supports (Horner & Sugai, 2000). If this level is insufficient, as evidenced by data-based outcomes, the final level of prevention-tertiary prevention-is enlisted.
In addition to being appropriate for students who are nonresponsive to primary and secondary efforts, tertiary prevention plans are also designed for students who have been exposed to multiple risk factors (Kern & Manz, 2004). Tertiary support involves ideographic, intensive interventions, such as functional assessment- based interventions (Lane, Umbreit, & BeebeFrankenberger, 1999; Lane, Weisenbach, Phillips, & Wehby, 2006), mental health support services, and intensive curricular modifications. Approximately 5% to 7% of the student body may need this level of prevention.
Schoolwide behavior support "is not a new phenomenon ..., but is an approach that is well suited to our times" (Horner & Sugai, 2000, p. 231). To date, the majority of studies examining the efficacy of primary prevention plans-the foundation of the three-tiered model- have been conducted at the elementary level (Hunter, Elias, & Noms, 2001; Lane & Menzies, 2003; Netzel & Eber, 2003; White, Man, Ellis, Audette, & Algozzine, 2001). Far less evidence has been acquired as to the efficacy of primary prevention plans in middle and high schools: Only a few studies have been conducted with students in Grades 9 through 12 (Lane, Robertson, & Graham-Bailey, 2006).
SCHOOLWIDE, PRIMARY PREVENTION IN MIDDLE AND HIGH SCHOOL SETTINGS
A recent review of the literature base on schoolwide interventions with primary-level efforts conducted in secondary schools identified 14 studies published between 1997 and 2005; these studies reported data on 63 schools (Lane et al., 2006). Only one article reported the outcomes of a primary plan implemented at the high school level (Skiba & Peterson, 2003), whereas the remaining articles focused on middle or junior high schools. Of the 14 articles reviewed, 6 reported outcomes for primary plans explicitly labeled as schoolwide positive behavior supports (SW-PBS) for the entire school as a setting (LohrmannO'Rourke, Knoster, Sabatine, Smith, Horvath, & Llewellyn, 2000; Metzler, Biglan, Rusby, & Sprague, 2001; Sprague, Walker, Golly, White, Myers, & Shannon, 2001 ; Taylor-Greene & Kartub, 2000) or focused on a specific setting within the school (e.g., hallways; Kartub, Taylor-Greene, March, & Horner, 2000). Other primary plans focused on instructional models of schoolwide discipline (Gottfredson, Gottfredson, & Hybl, 1993; Skiba & Peterson, 2003), bullying programs (Stevens, De Bourdeaudhuij, & Van Oost, 2000), and violence prevention in general (Mehas, Boling, Sobieniak, Burke, & Hagan, 1998; Shapiro et al., 2002; Sprague et al., 2001).
Although the number of studies identified was relatively minimal, outcomes were generally favorable, with associated decreases in office referrals over time (Colvin, Kameenui, & Sugai, 1993; Lohrmann-O'Rourke et al., 2000; Metzler et al., 2001; Sprague et al., 2001), lower levels of aggression (Metzler et al.; Shapiro et al., 2002), decreases in hallway noise (Karrub et al., 2000), and increases in school safety (Metzler et al., 2001). Yet, many studies were descriptive, nonexperimental investigations focused on preintervention-postintervention comparisons in one school (e.g., Kartub et al., 2000; Lohrman-O'Rourke et al., 2000; Luiselli, Putnam, & Sunderland, 2002; Mehas et al., 1998; Taylor-Greene et al., 1997; Taylor-Greene & Kartub, 2000). A few studies involved comparison schools in the designs (e.g., Colvin et al., 1993; Gottfredson et al., 1993; Metzler et al., 2001; Sprague et al., 2001), whereas other studies employed more rigorous evaluations that included true experimental designs (e.g., Cook et al., 1999; Shapiro et al., 2002; Stevens et al., 2000). All studies in this review focused on the school as the unit of analysis, looking at the outcomes associated with the introduction of SW-PBS programs, with no attention to how different types of students (e.g., students at risk for behavioral problems, with special education needs, and with typical behavioral performance levels) responded to the program. Collectively, these investigations of SW-PBS have offered important information regarding how to design, implement, and evaluate schoolwide interventions. Yet, several studies included in this review had methodological features that threatened the internal and external validities of the findings.
LIMITATIONS OF PRIMARY PREVENTION EFFORTS IN MIDDLE AND HIGH SCHOOLS
Gersten, Fuchs, Compton, Coyne, Greenwood, and Innocenti (2005) delineated essential quality indicators for group experimental and quasi-experimental research articles. These indicators include guidelines for describing participants, implementing the intervention and describing comparison conditions, selecting outcome measures, and analyzing data. When examining the previously mentioned SW-PBS intervention studies conducted in middle and high schools in light of these criteria, there were several quality indicators lacking or absent. For example, in some instances, descriptions of the school populations are not precise enough to allow for replication or accurate conclusions regarding generalizability of outcomes. Similarly, in some studies, descriptions of intervention and comparison conditions lack sufficient detail (e.g., description, intervention dosage, fidelity), which also impedes replication and generalizability of findings. Outcome measures used in SW-PBS prevention studies have also been criticized for the following: (a) being too narrow in scope, with heavy reliance on office referral data that, in the absence of a validated system, may be more indicative of teacher behavior rather than student behavior; (b) lacking sufficient sensitivity to detect changes in student behavior that may be occurring\; (c) failing to obtain or report accuracy of entry, reliability, and validity data; and (d) not including measures of treatment integrity and social validity (Lane & Menzies, 2005; Lane, Robertson, et al., 2006).
Finally, although some studies of SW-PBS models have employed rigorous evaluation, moving toward true experimental designs (e.g., Cook et al., 1999; Shapiro et al., 2002; Stevens et al., 2000), the majority of studies conducted in middle and high school settings have been descriptive and nonexperimental, involving pre-post comparisons at one school. While data from these studies were analyzed correctly with careful attention not to draw causal inferences from a nonexperimental design, the primary questions thus far have focused on how the school as a whole responded to the SW- PBS prevention model. Only at the elementary level have there been analyses to glean more specific information from these primarily descriptive studies by conducting analyses to determine the extent to which different types of students-such as students at risk for behavioral concerns, with typical behavioral profiles, or with high- incidence disabilities-respond to this global level of support (e.g., Cheney, Blum, & Walker, 2004; Lane & Menzies, 2005; B. Walker, Cheney, Stage, & Blum, 2005).
DIRECTIONS FOR FUTURE RESEARCH
Studies of SW-PBS models are highly complicated with respect to research methodology (e.g., experimental design, statistical analysis), practicality (e.g., feasibility, social validity, implementation logistics), and sustainability. The research conducted to date has paved a solid path as an entry into this important line of inquiry. It is imperative that investigations of SW-PBS continue to be refined to design studies that address the concerns previously mentioned so that (a) accurate conclusions can be drawn about the efficacy of SW-PBS efforts for the school as a whole and (b) data from schoolwide interventions can be used to identify and support students requiring secondary intervention efforts. We respectfully request input from experts in SW-PBS (e.g., Douglas Cheney, Michael Epstein, Robert Horner, Timothy Lewis, Ron Nelson, George Sugai) to continue to advance schoolwide intervention efforts by tackling these issues.
A second call is to determine how different types of students respond to SW-PBS efforts, given that it is likely that not all students react uniformly. As previously mentioned, a few studies have been conducted at the elementary level to determine how different types of students respond to the SW-PBS plan. For example, Cheney and colleagues (2004) reported school-level and student- level changes for a sample of 56 students at risk for emotional and behavioral problems (EBD) and 68 typically developing students to determine how different types of students responded to the SW-PBS program. Results indicated that (a) leadership teams perceived positive outcomes associated with the SW-PBS program; (b) students at risk for EBD demonstrated significant decreases in problem behaviors and increases in social skills, as measured by the Social Skills Rating System (SSRS; Gresham & Elliott, 1990), whereas students in the typical group remained stable over time; and (c) students with higher levels of social skills and lower levels of problem behavior earned fewer office discipline referrals (ODR).
Another descriptive study conducted by B. Walker et al. (2005) also examined the performance of 72 at-risk students attending three schools with SW-PBS programs. Students were identified using the Systematic Screening for Behavior Disorders (SSBD; H. M. Walker & Severson, 1992). Two of the research objectives were to determine (a) how teachers rated at-risk students' social skills and problem behaviors at year's end, and (b) how these outcomes related to the distribution of the student's ODRs. Results indicated that there were significant differences between students with internalizing and externalizing behavior patterns at year's end, with teachers viewing students in the externalizing group as having fewer social skills and more problem behaviors than students in the internalizing group. In addition, students with externalizing behavior patterns were more apt to earn ODRs.
Lane and Menzies (2005) conducted a study that incorporated academic performance by examining (a) the accuracy of teacher nominations in identifying students with and without academic and behavioral concerns (N = 86) and (b) the degree to which students with academic (n = 29), behavioral (n = 16), and combined (n = 15) concerns responded to the schoolwide program relative to students with typical profiles (n = 26). Findings revealed that teachers were highly accurate in discriminating between these four types of elementary students using academic variables at the state (e.g., standardized tests) and local (e.g., district multiple measures) levels, as well as behavioral variables (e.g., teacher report, school record data), accounting for 98% of the variance between the groups. Results also revealed differences in how these different types of students responded academically and behaviorally to the schoolwide model. Specifically, although there was not a significant difference in performance on the state academic measures, students in the single-risk groups made significantly more progress on local academic measures as compared to typical students. Students in the combined-risk group exhibited significant decreases in student risk status relative to the other three groups.
In sum, these studies reported differential performance patterns across different types of students. However, these findings have not been explored at the secondary level. Our goal in this study was to extend this line of inquiry by examining student-level effects of SW- PBS programs conducted at the high school level.
The purpose of the current investigation was to examine the extent to which SW-PBS prevention programs that were implemented with high fidelity and rated socially valid by constituents affected different types of high school students. Specifically, this study addressed the following four research questions:
1. To what extent were teacher nominations able to accurately identify high school students with externalizing, internalizing, or comorbid behavioral concerns?
2. To what degree were the SW-PBS prevention plans implemented as planned?
3. Did different types of students receive equal access to reinforcement?
4. To what degree did students with varying teachernominated profiles (i.e., externalizing, internalizing, comorbid), as well as students with typical behavior patterns and students with high- incidence disabilities, respond to the SW-PBS programs as measured by extant measures (i.e., discipline referrals, tardiness, grade point average, and referrals for additional supports)?
Participants were 178 students (117 [65.73%] males, 61 [34.27%] females) attending two high schools participating in a federally funded investigation of positive behavior support at the high school level conducted in middle Tennessee. During the first year of implementation, the grade-level breakdown was as follows: 10th grade, n = 52 (29.21%); 11th grade, n = 59(33.15%), and 12th grade, n = 67 (37.64%). Ninth-grade students were not included in these analyses because individual outcome data for the year prior to intervention, which we used as baseline data, were not available for this group. The student population was predominately Caucasian, (n = 157,88.20%), followed by Hispanics (n = 4, 2.25%), African Americans (n = 12, 6.74%), Asians (n = 3, 1.69%), and Native Americans (n = 2, 1.12%) (see Table 1).
Participants for this study were identified from two high schools (School C and School F) that were participating in a longitudinal study examining the design, implementation, and evaluation of three- tiered PBS plans at the high school level. Both urban fringe schools were located in an inclusive school district in middle Tennessee (National Center for Education Statistics, 2005). School C and School F enrolled 1,670 and 674 students, respectively. The majority of the student bodies were Caucasian, with few minorities represented (see Table 2).
There were 91 faculty members at School C and 45 at School F, all of whom agreed to implement the SW-PBS program as part of their regular school responsibilities. These same 136 faculty members were invited to participate in an evaluation of the SW-PBS model by self- evaluating treatment fidelity of the SW-PBS plan on a monthly basis using a component checklist and also allowing research assistants (RAs) to assess fidelity (description of plans and treatment fidelity procedures to follow). Sixty percent (n = 55; 60.44%) consented to participate in the evaluation from School C and 77.78% (n = 35) from School F. Although confidentiality of treatment fidelity data were ensured, some teachers expressed concern that these data might be used in the performance reviews conducted by the principal and, consequently, declined to participate in the evaluation.
To address the third objective of this study, teachers from each schools' English department, 12 at School C and 8 at School F, were invited to participate in an evaluation of how different types of students respond to the SW-PBS plan. The primary investigator (PI) met with all teachers from the English department during a regularly scheduled department meeting at each school. During these meetings, teachers were provided a consent letter explaining the purpose, procedures, and time commitments. The majority of English teachers consented to participating in this substudy at School C (91.67%, n = 11) and School F (87.5%, n = 7), resulting in a total of 18 participating teachers (3 males, 15 females).
Participating English teachers met with the PI during a planning period before school or after school to compl\ete the nomination process. Each teacher brought a copy of his or her rosters from each English period. For each section of English, teachers nominated one student for each of the following four categories: externalizing, internalizing, comorbid, and typical, using a modified version of SSBD. If no students in a given period had behavior patterns that paralleled the description, a student was not nominated for that category. The SSBD has met with demonstrated success in differentiating subgroups of elementary-age students (kindergarten through sixth grade; Walker, Ramsey, & Gresham, 2004; Walker & Severson, 1992), yet the SSBD has not been utilized at the high school level. For purposes of this study, the first and second authors modified the SSBD for use at the high school level by revising definitions of the externalizing and internalizing behaviors to (a) reflect characteristics and behavior patterns of high school-age students and (b) acknowledge the presence of comorbid concerns at the high school level (e.g., aggression and depression; Achenbach & Edelbrock, 1993). Examples and nonexamples in the revised definitions were reviewed and edited by the PBS teams during team meetings to ensure that they accurately described behavior patterns of high school-age students. We elected not to use subsequent rating scales as part of the screening procedures given the time demands this would place on the English teachers.
Students with externalizing behaviors exhibited inappropriate, aversive behaviors that impeded instruction, such as defiance, habitually being out of seat, noncompliance, and aggression. This category did not include students who also demonstrated internalizing behaviors nor did it include students receiving special education services. These exclusionary criteria were implemented to increase the homogeneity of the groups being compared.
Students with internalizing behaviors exhibited behavior problems that were directed inwardly. Examples included not talking with other students, avoiding or withdrawing from social situations, being unresponsive to social initiations by others, or being shy, timid, and/or unassertive. This category did not include students who also demonstrated externalizing behaviors nor did it include students receiving special education services.
Students with comorbid behaviors were students who displayed both internalizing and externalizing characteristics, as defined above, and who were not receiving special education services.
Students in the typical behavior category displayed average behavior patterns. They did not exhibit externalizing or internalizing behaviors and were not receiving special education services.
In addition, the project director obtained a list of approximately 50 randomly selected students who were receiving special education services for high-incidence disabilities to serve as the fifth category. Students in this group were classified as having specific learning disabilities, other health impaired, or speech/language impairments. None of the students in this category were nominated by teachers in the four categories just defined. This group was included at the schools' request to determine how special education students were responding to the SW-PBS program relative to typical and at-risk students.
One hundred and seventy eight (117 males, 61 females) students were nominated and placed into one of five categories: (a) externalizing (n = 25; 14.05%), (b) internalizing (n = 31; 17.42%), (c) comorbid (n = 25; 14.05%), (d) typical (n = 43; 24.16%), or (e) high incidence (n = 54; 30.34%). In this article, we provide information on how different types of high school students responded to the first year of a primary intervention plan. To reiterate, these 178 participants were (a) enrolled during the first year of program implementation, (b) enrolled during the year prior to implementation, and (c) either nominated by their English teacher as having externalizing, internalizing, comorbid, or typical behavior profiles or randomly selected from the pool of students with high- incidence disabilities. Students who were not enrolled during the prior year were excluded because baseline data were unavailable and therefore comparison between the two school years was not possible. Therefore, the sample included 10th-, 11th-, and 12th-grade students. Chi square analyses contrasting Group (externalizing, internalizing, comorbid, typical, and high incidence) x Grade Level (10th, 11th, 12th) was not significant, χ^sup 2^(8, N = 178) = 15.7015, p = 0.05, A chi square analysis contrasting Group x Gender revealed statistically significant differences, χ^sup 2^(4, N = 178) = 12.4147, p = 0.01, with the externalizing and comorbid groups having proportionately more males. A chi square analysis contrasting Group x Ethnicity could not be interpreted, as the results may not be valid due to low cell sizes.
Careful consideration was given as to which teachers should be involved in this subtypes study. We elected not to involve all teachers given the time commitment required in the nomination procedures and the variability in curricular demands across different courses. Our goal was to (a) control for variability in course content and task demands and (b) ensure, to the extent possible, that academic expectations were consistent among all teachers completing the rating scales. Therefore, we designed the nomination procedures to ensure that all students were evaluated by the same content area teacher at the onset and end of the school year. Given that all students in Grades 9 through 12 are required to take English classes for all 4 years of high school, we selected English teachers to serve as the informant. Participating English teachers received a modest stipend for completing the rating scales.
Schoolwide Positive Behavior Support Programs
Prior to the first year of implementation, School C and School F participated in a year-long training in which teams of five adults (an administrator, two general educators, a special educator, and a parent) representing each school attended six sessions (two full- day sessions and four 3-hr sessions held after school) to design, implement, and evaluate a SW-PBS plan. The training provided rationale for this systematic approach and explained the use of evidence-based practices to decrease problem behavior and increase academic achievement. Teams were taught to use data-driven methods to support all students and were responsible for sharing drafts of the plan with faculty members and collecting both formal and informal feedback through surveys. Following completion of the training, school faculties voted to implement their SW-PBS plans during the following academic year (see Lane, Wehby, Robertson, & Barton-Arwood, 2006, for the training procedures). The training sessions were held in successive school years, with School C participating during the first year, followed by School F during the next academic year. Implementation was also staggered, with both schools implementing the program during the academic year immediately following the training series.
The PBS teams from School C and School F designed SW-PBS prevention programs consisting of three components: five clearly stated behavioral expectations, procedures for teaching these expectations, and procedures for reinforcing the expectations. School F's team also incorporated the district-wide social skills component, Character Under Construction, into the primary plan. Procedures for teaching expectations included (a) posting the school mission statement and posters depicting expected behaviors in classrooms, common areas, and hallways; (b) providing daily reminders through morning and afternoon announcements; (c) holding monthly assemblies; and (d) having teachers model expectations during the school day. Both schools used PBS tickets to reinforce students for meeting the expectations. Students at School C could receive a PBS ticket for meeting any of the five behavioral expectations, which were introduced every other month over the course of the school year. Students at School F could receive a PBS ticket for (a) meeting any of the five behavioral expectations that were introduced in entirety at the onset of the school year or (b) demonstrating any of the nine character traits taught on a monthly basis as part of the district-wide program. Distribution of tickets was contingent upon students' demonstrating one of the specified expectations. These tickets were collected for weekly schoolwide drawings for items such as preferred parking spaces, sports passes, homecoming and prom packages, and food certificates.
Social validity was assessed prior to implementation to measure the level of perceived acceptability from the faculty. During a regularly scheduled faculty meeting, faculty members anonymously completed the Primary Intervention Rating Scale (PIRS; Lane, Wehby, & Robertson, 2002), an instrument adapted from the Intervention Rating Profile-15 (IRP-15 ; Martens, Wirt, Elliott, & Darveaux, 1985). The PIRS is a 17-item survey containing statements rated on a 6-point Likert-type scale (1 = strongly disagree, 6 = strongly agree). High scores indicated high acceptability for the plan, with total scores ranging from 17 to 102. Results revealed that both Schools C and F rated the primary plan favorably, with respective total mean scores of 71.53(SD = 17.93) and 80.62 (SD= 10.57). Summary data were shared with the PBS teams and faculty members to report acceptability and estimate levels of buy-in by school personnel.
Treatment integrity of the SW-PBS plan was assessed from two perspectives: teacher self-report and direct observations by RAs. Although all teachers agreed to implement the SW-PBS plans as part of their regular school responsibilities, treatment integrity data were collected only for consented teachers who elected to participate in the p\roject evaluation. The first perspective was teacher self-report. Teachers who agreed to participate in the overall study of the efficacy of SW-PBS completed a brief treatment integrity scale, Positive Behavior Support Plan: Primary Level- Discipline Plan, on a monthly basis to evaluate the extent in which they implemented the discipline component of the primary intervention plan as intended. The component checklist contained six items that depicted procedures for teaching and reinforcing (e.g., giving a student a ticket paired with behavior-specific praise tied to one of the schoolwide expectations) schoolwide expectations across settings. At the end of each month, teachers rated each item on a 3-point Likert-type scale ranging from not at all (O), part of the time (1), to all of the time (2). Participating teachers at School F also completed a second treatment integrity scale, the Positive Behavior Support Plan: Primary Level-Social Skills, to determine the extent to which they implemented the social skills component of the SW-PBS plan. This rating scale also contained six items evaluated on the same scale. Composite scores for overall monthly teacher-reported integrity ratings were created each month by computing a percentage of implementation. Then an overall yearly session integrity score was computed for all 10 months (possible range 0%-100%). Mean scores were also completed for each component over the course of the academic year. Given that self-report data on treatment integrity suggests higher levels of integrity than direct observation techniques (Lane & Beebe-Frankenberger, 2004), the level of implementation was also assessed by an outside observer.
Moreover, RAs also assessed treatment integrity via direct observation. Five consented teachers were selected randomly each month to be observed. The RAs entered the classrooms or work area (e.g., gym, field) at an unscheduled time, observed instruction for approximately 30 min, and completed a parallel version of the treatment integrity component checklist for the discipline and social skills plans during the observation period. This direct observation checklist contained the same list of procedures for teaching and reinforcing the schoolwide expectations. Composite scores were computed as described above.
In sum, fidelity of the SW-PBS plans, which included discipline (School C and School F) and social skills (School F) components, was assessed using self-report and direct observation techniques to determine the degree to which teaching and reinforcement procedures of the schoolwide plan were in place.
A variety of extant measures were used to assess student performance. Measures included grade point average (GPA) to assess academic performance; unexcused tardies, discipline, and suspension data to assess behavioral performance; and referral data to assess the degree to which students required more intensive intervention efforts either on campus (counseling, prereferral intervention, and special education) or off campus (alternative learning center). These measures were selected for two reasons. First, these variables are those most often used by schools to monitor student progress and make judgments regarding the success or failure of student performance in general and in response to SW-PBS programs. second, additional measures such as standardized behavior rating sales or achievement tests would have provided more precise information. However, placing additional task demands on teachers to complete rating scales and requesting students to miss instructional time to complete additional assessments likely would have decreased teacher- and possibly even school-participation. Thus, the focus of this initial study used extant data to assess student performance.
An RA collected data monthly and entered them into an Excel spreadsheet. Reliability of data entry was assessed by having a second RA check the accuracy of 25% of the student-level data. Errors were minimal (see a later section in this article for the reliability of data entry for each outcome measure); however, any errors identified were corrected. We used monthly data to compute annual scores for the baseline (Time 1) and intervention (Time 2) years (see Table 3).
Academic: Grade Point Average. We obtained quarterly GPAs for both schools. The district used a 4-point GPA scale, with the grades of A, B, C, and F (including pluses and minuses) as options. During Time Point 2 for School F, the district introduced a D letter grade (also with pluses and minuses). The scale changed to include a course percentage below 69.5 as a D and below 66.5 and above 64.5 as a D minus instead of as an F, as reported in prior years. Mean reliability of entry for quarterly GPA scores at both schools was 98.96%. Annual GPA scores were computed by averaging the quarterly GPAs.
Attendance: Unexcused Tardies. School tardiness reports were collected monthly for both schools. For purposes of this study, data were collected on unexcused tardies to class, defined as late entrance into any class without an excuse or specifically not entering the assigned classroom before the bell rang. The total number of unexcused tardies were summed for the school year and divided by the number of instructional days to determine the rate of unexcused tardies per day for each school year. Mean reliability for entry was 98.88%.
Behavior: Discipline and Suspension. Discipline and suspension referrals were collected monthly from both schools. The discipline referral data reported the number of referrals obtained during school year per student. Suspension referral data consisted of the total number of in-school suspensions obtained during school year. Rates for discipline and suspension referral were computed by dividing the total number of referrals by the number of instructional days during each year. Mean reliabilities for entry of discipline and suspension referrals were 98.93% and 99.23%, respectively.
Referrals. Four different types of referrals were monitored during baseline and intervention years. Specifically, referrals included referrals to an alternative learning center (ALC), an existing counseling program that provided students with onsite counseling (Students Taking a Right Stand; STARS), the prereferral intervention team (General Education Intervention Team; GEIT), and special education eligibility determination (SPED). For special education referrals, data were also collected to record the outcome of the referral (e.g., whether the student was placed into special education; SPED-Placed) and the corresponding labels assigned. Reliability of data entry was high for all types of referrals: ALC (M = 92.23%), STARS (M = 98.68%), GEIT (M = 99.5%), SPED (M = 100%), and SPEDPlaced (M = 100%).
Reinforcement Component: Tickets. Access to reinforcement was measured according to the number of tickets distributed and turned in at both schools. Specifically, tickets that were distributed to and turned in by students were entered into school wide drawings for students to possibly win a prize (e.g., parking pass, prom tickets, football tickets). As part of the SWPBS plan, tickets were given to students contingent upon the student demonstrating one of the expectations specified in either the discipline or social skills components of each schoolwide plan. This component, as mentioned previously, was assessed on the treatment integrity forms, which were completed by RA and teacher perspectives. Although, it is difficult to determine the actual reinforcing value of the ticket given that a reinforcer is only a reinforcer if contingent introduction of the ticket increases the probability of the behavior occurring in the future. We surmised that the ticket held some value to the student if the student took the time to enter the ticket into the lottery system for the possibility of earning one of the prizes. Each student's rate of access to reinforcement was determined by dividing the total number of tickets given to a student by a teacher and then turned in to the lottery by the student divided by the number of instructional days. Mean reliability of ticket data entry was also high (98.68%). This measure served as an estimate of or a proxi for students' access to reinforcement.
The experimental design was a 5 2 (Group Time) repeatedmeasures model. Group (externalizing, internalizing, comorbid, typical, or high-incidence disabilities) was the between-subjects factor, and time (Time 1 : the year before the PBS model was implemented; Time 2: the first year of PBS implementation) was the repeated measure. This model produced a Group ? Time interaction and main effects for group and time.
Characteristics of Student Groups
The five categories of students were compared on four outcome measures (GPA, unexcused tardies, suspensions, and ODRs) in a one- way, fixed effects multivariate analysis of variances (MANOVAs) using the general linear model. If the MANOVA was significant, univariate ANOVAs were interpreted. Significant ANOVAs were followed by the Tukey-Kramer modification of the honest significant difference (HSD; α = 0.05) simultaneous confidence interval technique. This multiple comparison technique substitutes the harmonic mean (M = 35.60; SD = 12.64) to control for (a) unequal group sizes and (b) experiment-wise Type I error.
For dichotomous variables (GEIT, STARS, ALC, SPED, SPED-P), the frequency and percentage of referrals was computed for each category. A series of chi square tests were conducted to identify significant differences between groups prior to intervention if cell size permitted such analyses. The SPED variables were not examined for the high-incidence group given that those students were already receiving special education services.
Findings. A one-way MANOVA comparing the groups on GPA, unexcused tardies, suspensions, and disciplinary contacts produceda significant multivariate effect, with a Wilks's lambda (λ) value of 0.77, F(16, 520) = 2.87, p = .0002, accounting for 23% of the explained variance. Results of univariate ANOVAs revealed a group effect for only one variable: GPA, F(4, 173) = 7.85, p
Examination of referral data indicated that none of the students in any group were referred for ALC, GEIT, or SPED supports. However, at least 3% of students in all categories were referred for STARS services, with students in the comorbid category receiving the most referrals (n = 3, 12%). Chi-square analyses could not be interpreted given that 50% of the cells had expected counts less than five (see Table 6).
Statistical Analysis. Treatment integrity data were analyzed using descriptive procedures (e.g., means and standard deviations). Specifically, annual session integrity scores were computed from the teacher and project staff perspectives. Effect sizes were computed using the pooled standard deviation in the denominator (Busk & Serlin, 1992) to determine the magnitude of implementation differences between schools and raters. Effect sizes were interpreted using guidelines specified by Cohen and Cohen (1975), with 0.2 indicating a low effect, 0.5 a moderate effect, and 0.8 a high effect (see Table 7).
Findings. Teachers at both schools reported similar levels of fidelity for the discipline plan as evidenced by an effect size of - 0.25, with mean scores for School C and School F of 71.48 (SD = 16.34) and 75.30 (SD =14.51), respectively. Teachers at School F reported a similar level of fidelity for the social skills component withamean score of 74.44(SD = 14.45). However, there were differences between the fidelity of implementation from the project staff perspective (ES = -0.70), with higher levels of fidelity of the discipline plan at School F. Project staff also reported lower mean scores for the discipline component than did teachers at School C and School F, as evidenced by respective effect sizes 0.92 and 0.51. These effect sizes reveal moderate (School F) to high (School C) differences in the discipline fidelity levels reported from teacher and project staff perspectives. The magnitude of the discrepancy was somewhat lower for the social skills component at School F, with an effect size of 0.41.
Rate of Access to Reinforcement
Statistical Analysis. A one-way ANOVA, between-groups design was employed to determine if students in each of the risk categories received similar access to reinforcement in the form of tickets. Significant univariate results were followed by Tukey multiple comparisons to identify specific differences between groups.
Findings. Results failed to produce a significant effect for rate of access to reinforcement, F(4, 173) = 0.95, p = 0.4416. The five groups of students received equal access to reinforcement (tickets) over the course of the first year of intervention, with mean rates ranging from 0.032 (SD = 0.04) for students in the internalizing group to 0.058 (SD = 0.09) for students in the high-incidence and typical groups.
Statistical Analysis. A series of repeated-measures ANOVAs with time (Time 1: scores from the previous academic year, Time 2: scores from the current academic year with the PBS program) as the repeated measures factor and group membership as the second factor to examine how the different groups of students responded to the first year of the PBS program (Glass & Hopkins, 1996). If the Group Time interaction was significant, a one-way ANOVA was computed using the difference scores (Time 2-Time 1 scores for each group on each variable). Significant ANOVAs were followed by Tukey HSD multiple comparisons. The repeated-measures ANOVAs and the one-way ANOVAS using difference scores produced identical F values. Effect sizes were computed by using the pooled standard deviation in the denominator (Busk & Serlin, 1992) to examine the magnitude of differences in difference scores between groups and to determine the magnitude of difference between Time 1 and Time 2 scores for each group.
For the dichotomous variables (GEIT, STARS, ALC, SPED, and SPED- P), the frequency and percentage of referrals was computed for each group. A series of chi-square tests identified significant differences between groups postintervention if cell size permitted such analyses. SPED variables were not examined for the high- incidence group because those students were already receiving special education services.
Students' Progress on GPA. The Group Time interaction for GPA was not significant, F(1, 173) = 1.82, p = 0.13. Multiple comparisons were not examined given that interaction was not significant. Effect sizes computed with difference scores revealed a low-to-moderate difference between most groups. The greatest effect was between the internalizing and high-incidence groups (ES = 0.53) and the internalizing and comorbid groups (ES = 0.56), with the internalizing group showing the greatest improvements in GPA. The comorbid and highincidence groups actually showed low decreases in GPA, with respective effect sizes of -0.12 and -0.06. The externalizing and internalizing groups showed low to moderate improvements in GPA during the first year of program implementation.
Students' Progress on Unexcused Tardiness. Group Time interaction for unexcused tardies also was not significant, F(4, 173) = 2.25, p = 0.07. Multiple comparisons were not interpreted given that the interaction was not significant. Effect sizes computed with difference scores revealed low to moderate decreases in tardiness for most students (ES=-0.10 to 0.65), with the greatest difference between students in the comorbid and typical groups (0.65). Effect sizes also revealed moderateto-high decreases in tardiness for students in the internalizing and typical groups, with respective effect sizes of -0.60 and -0.72. Students in the high incidence and externalizing groups showed low-to-moderate effect sizes of-0.46 and -0.17. Students in the comorbid group did not respond with low-tomoderate increases in tardiness.
Students' Progress on Suspensions. Group Time interaction for suspensions was not significant, F(4,173) = 0.21, p = 0.9319. Multiple comparisons were not interpreted given that the interaction was not significant. Effect sizes computed with difference scores revealed negligible differences between most groups. There were low- to-moderate differences between the high incidence and (a) internalizing, ES = -0.37, (b) comorbid, ES = -0.20, and (c) typical, ES = -0.38 groups, with the high-incidence groups having increases in suspensions. All groups showed low to moderate decreases in suspensions, with the greatest decreases for internalizing groups (ES=-0.27) and the least progress for the externalizing group (ES = -0.04).
Students' Progress on Disciplinary Contacts. Group Time interaction for disciplinary contacts was not significant, F(4, 173) = 1.07, p = 0.3708. Multiple comparisons were not interpreted given that the interaction was not significant. Effect sizes revealed a range of outcomes between groups with effect sizes ranging from 0.00 (comorbid vs. high incidence) to 0.43 (externalizing vs. typical). Only students in the typical group showed low decreases in disciplinary contacts (ES=-0.25). Students in the externalizing groups showed moderate increases in disciplinary contacts (ES = 0.52).
Students' Progress on Referrals. Examination of referral data during the second time point indicated that still none of the students in any group were referred for ALC or GEIT supports. However, there was an increase in the percentage of students receiving STARS supports for all groups save for the internalizing category. The percentage of students referred for STARS increased to 7% (n = 3) in the typical category, 12% (n = 3) in the externalizing category, 20% (n = 11) in the highincidence category, and 28% (n = 7) in the comorbid category. Only one student, from the comorbid group, was referred for and placed in special education. A chi square analysis contrasting Group STARS revealed significant differences, χ^sup 2^(4, N = 178) = 10.6944, p = 0.03. Yet, caution must be exercised when interpreting these findings as 30% of the cells had low counts (less than 5).
Schoolwide primary intervention efforts at the elementary level have met with demonstrated success. However, less attention has been devoted to studying the impact of primary interventions at the middle and high school levels, with very few studies (e.g., Skiba & Peterson, 2003) conducted at the high school level (Lane, Robertson, et al., 2006). Of the primary plans conducted at the middle and high schools, results were favorable (Colvin et al., 1993; Lohrmann- O'Rourke et al., 2000; Metzler et al., 2001; Shapiro et al., 2002; Sprague et al., 2001). These studies of SW-PBS at the middle and high school levels did not, however, examine how students with different behavior profiles responded to primary intervention efforts (Cheney et al., 2004; Lane & Menzies, 2005; B. Walker et al., 2005).
The present study extended this line of inquiry by exploring the degree to which schoolwide primary intervention programs rated as socially valid at program onset and implemented with integrityaffected different types of high school students. Specifically, this study extended previous schoolwide primary interventions in secondary schools by (a) monitoring treatment integrity from teacher andresearch staff perspectives via component checklists, (b) assessing students on a range of extant schoolwide variables, (c) monitoring the degree to which different types of students received reinforcement specified in the schoolwide primary plan, and (d) monitoring academic and behavioral performance of different types of teacher-identified students.
Findings indicated that students with varying degrees of risk were most able to be differentiated in terms of academic performance, as measured by GPA, with students with typical behavioral profiles having significantly higher GPAs than students in the externalizing, comorbid, and high-incidence groups. These findings are consistent with recent findings from the National Longitudinal Transition Study-2 study that revealed lower GPAs for students with disabilities as compared to their typical peers (Wagner & Davis, 2006). Students with internalizing behavior patterns also had significantly higher GPAs than students with high- incidence disabilities. However, we found no significant differences between groups in terms of disciplinary contacts, suspensions, or unexcused tardies. Similarly, no clear differences between groups were found in terms of referrals to the prereferral intervention process (GEIT), special education (SPED), or alternative learning centers (ALC). Examination of referrals for counseling services (STARS) did reveal some differences between groups, with the comorbid group receiving the most referrals compared to the other risk groups. Based on the characteristics of students nominated in the various groups, results suggest that high school English teachers were able to differentiate the groups in terms of academic performance but were less able to distinguish the groups in terms of behavioral performance. Whereas elementary teachers appeared to be highly accurate "tests" when differentiating between (a) typical and at-risk students with academic and behavioral concerns (Lane & Menzies, 2005), (b) students needing special education services (Algozzine, Christenson, & Ysseldyke, 1981; Gresham, MacMillan, & Bocian, 1997), and (c) young students who were and were not at risk for antisocial behavior (Lane, 2003); high school teachers-as expected-attended more to academic performance than behavioral performance when distinguishing between risk groups. This focus on academic performance rather than on decorum was not unexpected, given that the curricular content becomes more differentiated during high school years as compared to elementary and middle school years (National Middle School Association and National Association of Elementary School Principals, 2002). It may be that a student's ability to meet-or not meet-academic expectations is of greater concern to teachers than is a student's ability to meet behavioral expectations.
Furthermore, this is the first instance in which we, as a field, have asked high school teachers to differentiate between externalizing, internalizing, and comorbid behavior patterns relative to students with typical behavioral patterns and those receiving special education services. Although the literature has clearly indicated that elementary school teachers are able to differentiate these subgroups (Lane, 2003; H. M. Walker et al., 2004), limited attention has been devoted to conducting and validating screening efforts in middle and high schools (Goodman, 1997). The literature suggests that students with externalizing behavior patterns, relative to students with internalizing behavior patterns, are more apt to solicit teacher attention during the elementary years (Gresham, Lane, MacMillan, & Bocian, 1999; Lane, 1999, 2003; Lane & Menzies, 2005; Morris, Shah, & Morris, 2002; H. M. Walker et al., 2004). This may be even more salient at the high school level, where the consequences of externalizing behaviors (e.g., aggression, coercion) become extremely deleterious. Therefore, screening procedures are even more imperative at the secondary level to detect not only students with externalizing behaviors but also students with internalizing behaviors (e.g., depression, anxiety, eating disorders, somatic complaints). It is possible that students with internalizing behaviors are even more likely to go unnoticed at the high school level where students have multiple teachers throughout the day; teachers have upwards of 200 students during the school day; and both teachers and students are under pressure to meet increasingly challenging curricular demands. These circumstances may afford less time for personal interactions between teachers and students, making it less likely that students with internalizing behaviors will be recognized. Although these results must be interpreted with caution, given that screening procedures have not been systematically validated for use at the high school level, we contend that this study provides an important first step in the use of screening procedures at the high school level. Yet, one must consider that the students' labels were speculative given that the definitions had only face validity, at best, and were not confirmed by validated rating scales due to the additional time this would have required for English teachers to complete such scales.
Fidelity and Reinforcement
One of the key limitations of schoolwide primary intervention plans implemented in middle and high schools has been the lack of attention to treatment integrity, with few exceptions (e.g., Cook et al., 1999; Gottfredson et al., 1993; Metzler et al., 2001; Shapiro et al., 2002; Sprague et al., 2001). The task of monitoring treatment fidelity becomes more challenging as school size and plan complexity increase. However, just as accuracy of the dependent variable is essential, so is accuracy of the independent variable- the treatment (Lane & Beebe-Frankenberger, 2004). This study extends the literature by demonstrating that it is possible to assess implementation fidelity from different perspectives: teacher self- report and direct observations by an external observer. Findings from this study indicate a moderate to high level of fidelity, with mean scores in the mid-70s, and confirm the results of elementary investigations suggesting that outside observers report lower levels of implementation relative to teachers' self-reports (Lane, Bocian, MacMillan, & Gresham, 2004).
As mentioned previously, other investigations have examined the degree to which the reinforcement component was delivered. This was measured using a variety of methods, including percentage of students who qualified for lottery drawings (Luiselli et al., 2002); tickets, praise notes, and good news referrals (Metzler et al., 2001); and referrals for desired behaviors (Gottfredson et al., 1993). In this study, rate of access to reinforcement was measured using ticket data; however, the literature was extended by determining the degree to which different groups of students converged or diverged in the extent to which they received the reinforcement component (ticket). A primary goal of a strong SW-PBS program is to ensure that all students who exhibit the desired behaviors and/or meet the specified expectations are reinforced. One concern voiced by participants in this study who opposed the primary intervention plan was that most of the "tickets" would be awarded to students who typically exhibited problem behaviors, while students who were more withdrawn or who exhibited typical interaction patterns would continue to go unnoticed. Findings from this study revealed that differential access to reinforcement was not evident across the five groups compared. Instead, there were no significant differences in the rate of access to reinforcement during the first year of implementation, with mean rates of reinforcement ranging from 0.032 of students in the internalizing group to 0.058 for students in the high-incidence and typical groups. The absence of significant differences with respect to accessing reinforcement is encouraging, suggesting that this component was implemented as planned-reaching all different types of students. In practice, this means that students were accessing reinforcement at comparable rates. For example, a rate of 0.032 for the internalizing groups indicates that in a 30-day period, students in that group received approximately 1 ticket per month (0.96), whereas students in either the typical or high-incidence groups received slightly over 1.5 (1.74) tickets per month. If there were differences in rates of access to reinforcement, it would have been important to examine changes in outcome measures, taking differential rates of access to reinforcement into consideration. When monitoring rates of access to reinforcement, it is important to note that reinforcement is but one part of a SW-PBS plan and should not be used as a sole measure of treatment fidelity (Fox, Lane, Blevins, Robertson, & Wehby, 2006). It is important that treatment integrity be assessed from a comprehensive perspective.
Responsiveness to Intervention
The next objective of this study was to determine the degree to which the five groups responded to a schoolwide primary intervention program. Although multivariate procedures did not reveal statistically significant differences in how different groups of students responded over time, effect sizes suggest that students responded differently to the program. Specifically, effect size scores suggested low to moderate differences in responding between most groups, with students in the internalizing group exhibiting the greatest improvements in GPA, which was markedly different from the results for the comorbid, highincidence, and typical groups. Students with more severe concerns (the comorbid and high-incidence groups) actually demonstrated low decreases in GPA.
Effect sizes computed with difference scores also suggest that in addition to improved GPAs, all groups, except for the c\omorbid group, demonstrated decreases in unexcused tardiness. Students in the internalizing and typical groups made moderate to high decreases in tardiness, with respective difference scores (Time 2-Time 1) of- 0.15 and -0.30. Practically speaking, this means that students with internalizing behaviors moved from approximately 6 unexcused tardies in a 30-day period to fewer than 2 unexcused tardies (1.80). Similarly, students in the typical group moved from 10 to less than 1 (0.90) unexcused tardies in a 30-day period. In contrast, students in the high-incidence and externalizing groups were somewhat less responsive, with lower magnitude decreases in tardiness and respective difference scores of -0.11 and -0.04. The literature has suggested that students with comorbid concerns are often more resistant to intervention efforts (e.g., Achenbach & Edelbrock, 1993; Gresham, Lane, & Lambros, 2000; Lynam, 1996). Results from this study confirm this belief, with students in the comorbid group being the only nonresponsive group, as evidenced by an increased rate of unexcused tardies and a difference score of 0.05.
In terms of suspensions, all groups showed movement in the desired direction, as evidenced by negative effect sizes, which indicated a decrease in the rate of suspensions. There were moderate differences in responding, with the internalizing group showing the greatest magnitude of decreases (ES = -0.27). Yet, students in the externalizing and comorbid groups were least responsive on this measure, with respective effect sizes of -0.04 and -0.05.
The only group to show decreases in disciplinary contacts was the typical group, which demonstrated a low-magnitude decrease. Of all measures, disciplinary referral data appeared to be the least sensitive to change.
Inspection of referral data revealed no changes in referrals to the ALCs. However, there was a slight increase in referrals to counseling (STARS) services for students with externalizing, typical, high-incidence, and comorbid behavior patterns, with the comorbid group having the greatest increase in the percentage of students referred. In fact, the only student to be referred and placed into the special education services was a student from the comorbid group.
Collectively, these findings suggest that whereas students in the externalizing, internalizing, and typical groups all showed increases in GPA, decreases in unexcused tardies, and decreases in suspension, students with internalizing behaviors were perhaps the most responsive group, as evidenced by the magnitude of the effect sizes. Results also suggest that as expected, the comorbid group was perhaps the least responsive. In addition, these findings support the need to include a range of outcome measures other than office referral data if the goal is to obtain a more complete picture of the impact of SW-PBS plans on students' performance.
Limitations and Future Directions
Although this study offers an important first look into different patterns of responding to schoolwide primary intervention plans at the high school level, there are five key limitations that need to be acknowledged. First are limitations pertaining to the sample. Although the percentage of participating teachers was high, not all English teachers participated in this study. Therefore, not all students at both schools were included in the initial nominations. It is possible that other students might have been identified as meeting the criteria specified for the five groups compared. Conducting nominations inclusive of the entire student body would have been desirable so as to increase sample size and, consequently, statistical power. Due to the low sample size for some groups (e.g., externalizing, comorbid), it is possible that the small sample size resulted in a loss of statistical power to detect differences between groups. For example, results of the one-way ANOVA contrasting groups on difference scores for unexcused tardies approached statistical significance (p = 0.0659). It is possible that these differences would have been significantly different statistically if there were more students in each cell. Future research would be enhanced by ensuring that all students are eligible for the screening process and by having a larger sample size to afford more sophisticated statistical analyses.
The second limitation involves the use of the SSBD. As mentioned, the SSBD was modified for use at the high school level. The PIs developed the definitions and confirmed the appropriateness of these definitions, including the examples and nonexamples, with the PBS teams. The teacher nominations were not confirmed via use of validated behavior rating scales, such as the SSRS, due to the additional time commitments this would have placed on the English teachers which likely would have decreased teacher participation. Because the screening procedure was not validated for use at the high school level, the student labels must be considered speculative and the outcomes interpreted cautiously. Future research is needed to develop reliable, feasible screening tools for use in high schools and to find a method for addressing the tension between scientific rigorousness and practicality of administration.
The third limitation pertains to generalizability of the findings. The schools in this study were both urban fringe schools in a relatively high-performing, inclusive school district. In addition, all of the teachers employed at these two high schools were fully credentialed, with no teachers on emergency credentials. Although many of these district and school characteristics are desirable, they pose threats to the external validity or generalizability of the findings. It is possible, for example, that the teachers in these schools were more accustomed to accommodating individual differences within the context of the fullinclusion model. It may be that student performance would vary more extensively in schools with less qualified teachers who are not accustomed to inclusive practices. Replication of these findings in different geographic regions, in other settings (e.g., urban