Development of the OSTRC Conference Evaluation Toolkit

Research demonstrates that staff quality directly impacts student achievement in out-of-school time (OST) settings, and that effective staff development contributes to a skilled workforce. Evaluating OST professional development is therefore attracting increased attention from researchers, practitioners, and funding agencies. In the spring of 2004, the Out-of-School Time Resource Center (OSTRC) began searching for professional development evaluation instruments designed specifically for the OST field. Since the OSTRC could not locate research-based surveys for this genre, it implemented a pilot study to create and test such instruments. These surveys were designed to evaluate professional conferences, which are critical (but not exclusive) components of OST professional development opportunities. The overarching goal of this study was to operationalize the pathway between professional development conferences and increased student learning.


Introduction
Evaluating out-of-school time (OST) professional development is attracting increased attention from researchers, practitioners, and funding agencies.As Bouffard and Little (2004b) write, "Evaluation results [of professional development] can be used to understand and build on the strengths and weaknesses of these initiatives, to meet accountability requirements, and to advocate for more investment in professional development" (p.10).
The Out-of-School Time Resource Center (OSTRC) defines professional development as workshops, conferences, technical assistance, resource centers, peer mentoring, electronic listservs, and other supports designed to promote improvement, enrichment, and achievement in OST staff, programs and students.In the spring of 2004, the OSTRC began searching for professional development evaluation instruments designed specifically for the OST field.Since the OSTRC could not locate research-based surveys for this genre, it implemented a pilot study to create and test such instruments.These surveys were designed to evaluate professional conferences, which are critical (but not exclusive) components of OST professional development opportunities.The OSTRC initially chose to evaluate conferences, rather than other forms of professional development, because the number and diversity of participants ensured that the tools could be thoroughly tested over a short period of time.These tools could then be further modified to accommodate workshops, technical assistance, peer networking meetings, and other forms of staff support.

Theoretical Framework
The goal of this pilot study was to operationalize the pathway between professional development conferences and increased student learning.According to Guskey (2000), multiple conditions must be satisfied to accept a theoretical connection between these two elements.These conditions include: 1. Staff are aware of a professional development opportunity; 2. Staff have access to resources that will allow them to attend the event; 3. The presentation of the professional development event is effective; 4. Staff regard the new material as valuable; 5. Staff are given the opportunity to plan how they can apply new information to their program; 6. Organizations support staff members as they try to apply new information; 7. Staff successfully communicate new information to students; 8. Students successfully acquire new information; 9. Students are given the opportunity to plan how they can apply the new information in their lives; and 10.The program supports students in the application of new information.
If one of these conditions is not met, the pathway is attenuated and the final productchanges in student outcomes -cannot be maximized.Figure 1 presents a visual interpretation of the necessary conditions and includes the assumed direction of effect (Guskey, 2000).

Figure 1
Necessary Conditions to Achieve Student Outcomes (Guskey, 2000) Logistics Resources Attendance at event

Valued Effective Relevant
Successful Learning

Application
Increased Learner Outcomes (staff or students) This overall pathway represents the process that staff members undergo when they attend conferences, but is also similar to students learning new information in an out-of-school time program.It is also possible that a second process occurs after this pathway is completed by staff members.In other words, they cycle back through this process when implementing the new information in a program, resulting in increased student outcomes.

Staff Outcomes and Student Outcomes
The OSTRC developed instruments to measure aspects of the first part of this process: how professional development conferences impact staff.While the OSTRC built in proximate measures of staff learning (those that are directly linked to the workshops), it was not feasible to build in proximate measures of student learning.This is primarily due to the difficulty of isolating the impact of professional development from the influence of program quality, school environments, home life, community contexts, and other factors.However, distal measures of student learning (those that are not directly linked to the conferences) were ascertained by asking participants to rate the extent to which the children/youth in their programs benefited from the new information.Thus, the OSTRC developed and tested instruments to track changes in practitioners and practitioner perceptions, laying the foundation for future work directly addressing student outcomes.

Existing Standards and Evaluation Tools
After conceptualizing this process, the OSTRC reviewed existing performance standards for professional development.Only one set of standards was "research-based:" the National Staff Development Council's Twelve Standards of Professional Development (2001).These standards refer primarily to providing rather than evaluating professional development, and to classroom teachers rather than OST practitioners.However, their outcomes were easily adapted to the evolving OST model.
The OSTRC then identified several models that included multiple levels of professional development evaluation and found commonalities between the models proposed by Donald Kirkpatrick (1998), Thomas Guskey (2000), and Joellen Killion (2002).Each of these included measurements of participant reactions, learning, and application.Guskey (2000) and Killion (2002) also include assessments of organizational support, change or integration, and student learning.
After reviewing these models and standards, the OSTRC added an additional level of evaluation to accommodate the particulars of OST professional development.This level was termed "extension" and referred to the ability of OST staff to adapt new information to other youth audiences and share it with other programs, staff, and/or students.Figure 3 demonstrates which of the six domains is addressed by each author/approach.

Figure 3
Comparison Chart -Levels of Evaluation

Developing the Instruments
The OSTRC combined information from the NSDC standards and the above-mentioned evaluation models to create the multiple data domains, described below.Sample indicators were linked with each domain to reflect specific improvements in programs, teacher performances, and/or student outcomes.Principles of adult learning theory were also considered and incorporated.Finally, the OSTRC identified measurement tools that could assess each domain.From this process, it became clear that both qualitative and quantitative data were useful and necessary in evaluating professional development effectiveness.Utilizing multiple methodologies allowed for cross-comparison of specific data domains and was useful in answering questions of validity (Bouffard, & Little 2004a).The OSTRC recognized that its tools would need to integrate diverse methods such as interviews with staff members, program observation, self-assessment surveys for administrative and direct-service staff, focus groups, needs assessments, and rubrics for external evaluators to assess the professional development offering.The OSTRC then diagrammed useful techniques and data domains associated with two types of evaluation: formative and summative.Overall, the conference evaluations were designed to detect changes over time in each of the domains except participant reaction, which was limited to a single measurement.Surveys included Pre, Post, and Follow-Up Workshop Surveys, Presenter Self-Assessment Surveys, and Overall Conference Evaluations.Post Workshop and Presenter Self-Assessment Surveys compared participant reactions regarding workshop strengths and weaknesses to presenters' perceptions.The Pre, Post, and Follow-Up Workshop Surveys assessed learning over time.At each interval, the participants' knowledge, skills and attitudes/beliefs were measured.Post and Follow-Up Workshop Surveys measured organizational support, application, extension, and student outcomes, and were also compared over time.The Overall Conference Evaluations focused on general feedback and satisfaction.
The comparative questions from Post to Follow-Up Surveys were intended to measure the changes that occurred between the time individuals left the professional development conferences (often excited about how to use the new information) and the time they re-entered their work environments and attempted to integrate the new information.Comparing these perceptions and actions isolated the differences between wanting to use the information and actually having the time, support, and resources to incorporate it.
Lastly, in order to test the internal validity of the instruments, each domain included several similar questions to address the consistency of responses upon analysis.This pilot did not attempt to address issues of external validity, thus the findings are not assumed to be generalizable to larger populations of out-of-school time staff members across the country.Future studies built on nationally representative, random samples of participants could serve this purpose.

Measuring the Six Domains
The OSTRC employed specific techniques through which to measure the six previouslydescribed domains.These were:

Domain #1: Participant Reactions
To measure participant reactions to and satisfaction with the workshops, the Post Surveys asked several questions regarding the extent to which participants found the content and/or materials interesting, useful, practical, and relevant.Specific questions were divided into three subcategories: satisfaction, content, and logistics.Using the Likert scale, participants were asked to rate their agreement with given statements in each subcategory.

Domain #2: Participant Learning
To measure learning, participants were asked to rate, on a scale of 1 to 5, their level of knowledge, skill, and belief in the importance of the workshop topics in terms of benefiting the youth in their programs.In the first draft of the Post Workshop Surveys, these three questions were tailored to each conference workshop using the objectives provided by each presenter.For example, if a workshop was designed to offer information about a new strategy to combat truancy, participants were asked to rate their level of knowledge, skill, and belief about this new strategy.From the results, however, it was difficult to ascertain whether this level of specification was necessary.
To determine whether including workshop-specific objectives was productive, the OSTRC created a second version of this survey.In this draft, a sample of participants in ten workshops were given a survey in which they were asked two comparative sets of questions for each category of learning (knowledge, skill, and belief).First, they were asked one question about their increases in knowledge using general language, and then they were asked three questions about their increases in knowledge using specific language based on the objectives of the workshops.The results indicated that both sets of questions measured almost identical rates of increases in knowledge, skill, and belief.Therefore, the OSTRC concluded that workshopspecific language was unnecessary and included only generalized language in the final version of the Post Workshop Surveys.
Several other changes were made within this domain.The first version of the Post Workshop Surveys included a 10-point scale to rate changes in learning.After initial results were analyzed, the OSTRC determined that this could be compacted to a 5-point scale while still capturing the sufficient variation in survey responses.Measuring "belief" evolved from, "My level of belief in the importance of this topic" to the current wording noted above.In the Follow-Up Surveys, participants were asked detailed questions about changes in attitude regarding the workshop topics: "My attitude towards this new knowledge/skill has grown more positive" and "I saw a positive change in my behavior towards using this new knowledge/skill in my program."This domain was further strengthened by integrating various factors that reflect adult learning theory.Questions were added to gauge the extent to which respondents participated in physical, hands-on, or interactive activities during the workshop to enhance the learning process.Attendees were also asked to answer several additional questions pertaining to the learning process overall (e.g., "Were new ideas or materials presented?"and "Was a new activity modeled?").

Domain #3: Organizational Support
Using the Likert scale in the Post Workshop Surveys, participants were asked to gauge the level of support they believed they would receive from their administrators and other staff members, when implementing the new workshop information.In the Follow-Up Workshop Surveys, attendees were asked a comparative question regarding the actual level of support they received.

Domain #4: Application
Immediately after the workshop, the Post Workshop Surveys asked participants if they planned to apply what they learned to their programs.This was supplemented by several other questions regarding the anticipated ease of application, such as whether more than one staff member from their organizations attended the conference, what resources would be necessary to successfully implement the new knowledge/skill, and if they would be held accountable to anyone to use the information.In the Follow-Up Surveys, participants were then asked comparative questions about how they were able to apply the information.Questions included: • "I had time to implement the new knowledge/skill I learned;" • "The materials from this workshop sat untouched in my office (on a shelf, in my "To Do" box, etc.);" • "I was held accountable to apply this new knowledge/skill;" and • "The information changed the way I deliver my program." If participants responded affirmatively to this last question, they were asked to identify whether a change in knowledge, skill, or belief contributed the most to the change.If participants responded negatively to this question, they were asked to "check all that apply" from a list of factors that may have impeded the full application of the new material.
The analysis within this domain addressed two other factors: the type of job occupied by the participants and the type of workshops.For example, administrators attending content-based sessions might have applied the information by passing it along to teachers who instructed children on a daily basis.In this situation, their answers reflected indirect knowledge of application.On the other hand, teachers attending similar trainings could have offered direct knowledge regarding application.Since the goal of this initial pilot was to develop a set of surveys that could be applied to numerous settings, tailoring questions based on job position or type of workshop was beyond the scope of this study.However, future research should consider both factors when administering and analyzing these instruments.

Domain #5: Extension
The Post Workshop Surveys asked participants a set of questions regarding "extension."This domain was originally defined as knowledge or skills that can be "applied to other staff, programs, students, curricula, situations, colleagues."This definition was later revised to include two separate questions regarding the opportunity to 1) share new information with colleagues, and 2) adapt new information to other programs, staff and students.The OSTRC made this revision when it was apparent that the term "extension" was not clear to respondents.This revised set of questions included terminology that was more likely to be interpreted accurately by OST staff.The OSTRC asked these questions in both the Post and Follow-Up Surveys, to provide a basis for comparison between expectations and actual occurrences.

Domain #6: Student Outcomes
As stated previously, student outcomes were not directly measured in this study.However, in an effort to lay the groundwork for future work in this area, Post Workshop Surveys asked participants to assess the level of impact they felt the new information would have on their students.Attendees were asked to rate the following statement on a scale of one to five: "I think the youth in my program will benefit from this new knowledge/skill."In the Follow-Up Survey, they were asked a comparative question regarding the extent to which they felt the new information actually benefited the youth in their programs.

Additional Data: Demographics
Initially, the OSTRC surveys collected a modest amount of demographic information including gender, age, education, race/ethnicity, and state of residence.Participants were asked to provide their birthdays on each survey, as a means of matching their multiple surveys (Post Workshop, Overall Conference, etc.) while maintaining their confidentiality.After the results of the first pilot were analyzed, it appeared that some participants felt uncomfortable providing this information.To increase the compliance rate, a new identifier was requested: the first letter of the participants' first names plus the last four digits of their home telephone numbers.Participants' birthdays remained on the surveys as secondary identifiers, and the combination of these two questions increased the compliance rate from 77.0% to 91.4%.
As the OSTRC worked with conference planning committees on data collection and evaluation processes, some staff were concerned that participants would tire of completing long surveys or multiple surveys within a given conference.To avoid this frustration and to increase the response rate, some demographic questions were only asked on the pre-surveys while others were only asked on the overall evaluations.After the analysis of the first pilot, the OSTRC discovered that only a small percentage of individuals completed both surveys.Although participants may have completed multiple surveys, without the unique combination of at least one pre-workshop survey and one overall evaluation, their complete demographic information was unknown.For this reason, a full set of demographic questions was added to each survey in the second pilot.The second pilot also included additional demographic questions regarding program position, status of position (part time vs. full time), youth populations served by programs, and months/years of experience.

Methodology and Data
As explained previously, while several types of professional development opportunities are available to out-of-school time staff, the OSTRC focused on assessing conference workshops.Three conferences were chosen to participate in this pilot study.The first was a local conference geared mainly toward "frontline staff," or those whose primary responsibilities included working directly with youth on a regular basis.The second was a statewide conference designed for "administrative staff" -individuals who primarily provided program administrative services on a regular basis.The third was a regional conference, also designed for administrative staff.Using these diverse settings to administer the first drafts of the instruments, the OSTRC gathered a similarly diverse and rich set of information.A total of 4,444 surveys were collected from 1,079 participants over a period of seven months (Buher-Kane, Peter, & Kinnevy, 2005/2006).

Figure 5
Number

Additional Survey Revisions
The OSTRC incorporated several survey revisions, in addition to those listed under each specific domain.After each application of the instruments, the OSTRC used both data analysis and contextual information to continually revise the instruments and prepare them for the next applications.To connect research and practice, practitioner needs were given substantial consideration during the instruments' development.As organizations were recruited to participate in the pilot study, their staff members were encouraged to become involved as well.These staff members assessed the utility and practicality of the surveys and evaluation process, which contributed to several changes to the materials and processes.

Survey Revisions
The OSTRC learned about planning and implementing successful conference evaluations during this pilot.For example, converting from a pre/post test design to a post test design reduced the number of surveys each participant was expected to complete.This new strategy was more practical for conference organizers and less frustrating for participants, who may have attended several workshops within one conference.It also allowed attendees to report their pre and post levels of knowledge, skills, and attitudes/beliefs after the workshop.Since participants frequently did not have accurate expectations of workshops beforehand, the OSTRC found this to be a more precise method than a more traditional pre/post design.For example, many participants chose workshops based on the titles and brief summaries listed in conference brochures, which may not have accurately represented the workshop topics or level of expertise expected of participants.Without a common frame of reference, pre-survey measures were difficult to compare across individuals.In addition, before attending workshops, some individuals believed they had much to learn but left with little new information.At times, these situations resulted in negative change scores.Therefore, asking the participants to rate their pre and post comprehension and comfort levels after the workshops allowed them to respond within a common frame of reference so that the OSTRC could calculate accurate change scores on individual levels and make more accurate comparisons across participants.

Process Revisions
The OSTRC quickly altered how it implemented the Follow-Up Surveys.Initially, these surveys were distributed to participants electronically three months after the conferences, using email addresses gathered from workshop sign-in sheets.However, the response rate of the first pilot follow-up surveys was extremely low (7.3%).The OSTRC postulated three explanations for this problem: • The first theory was that implementing the surveys three months after the conferences allowed too much time to pass.Some researchers recommend this three-month interval to provide sufficient time for staff to implement new knowledge, skills, and attitudes (Guskey, 2000).However, it may be that this time frame is more pertinent after intensive professional development offerings, such as a series of topic-specific workshops.Since a conference is less intensive and directed, less time may be needed to apply the new information -or to forget or neglect it.
• The second theory pertained to how the follow-up surveys were distributed.Initially, the follow-up emails were sent directly from the OSTRC, which was not always a familiar entity to the participants.Conference planning organizations were visible and connected to the respondents, whereas the role of the OSTRC was not emphasized.
• Finally, the follow-up email contained the surveys attached as Word documents.These types of attachments are often difficult for respondents to complete due to downloading, spacing, and formatting issues.
To address these three issues in the second pilot, the conference planning organizations distributed the follow-up surveys to participants only one month after the events, using predrafted letters and email distribution lists created by the OSTRC.In addition, attendees received a link to a web-based survey rather than an attached document.These adaptations kept the data collection methods standard across conferences and reduced the time and effort required of the conference planning organizations.Ultimately, these revisions increased the completion rate of the Follow-Up Surveys to 41.1%.
The OSTRC also revised how it categorized workshops.Initially, it identified two general types of workshops: those which presented content that could be directly transferred to students (e.g., a new curriculum) and those which presented information that could be indirectly transferred to students (e.g., the status of childhood obesity in the U.S.).The latter often provided theoretical, contextual, or reference information which was more difficult to track in terms of its impact on students.These two categories allowed the OSTRC to separate different processes that may have occurred when measuring application, extension, and benefits to program youth.Although this differentiation proved useful, the analysis showed that multiple subcategories were needed within each of these types.The workshop differentiation was eliminated, and the OSTRC is currently testing a larger series of workshop typologies which may address some of the variation in reported outcomes.
Lastly, the OSTRC recognized the need to efficiently organize the evaluation process to maximize data collection efforts.This type of evaluation requires commitment from conference sponsors, organizers, and participants.To ensure this level of commitment, the OSTRC developed two separate manuals to accompany the release of the instruments: a User's Guide and a Reference Guide.These materials summarize the lessons learned through this process and suggest "promising practices" for evaluating OST professional development.

Conclusion and Future Work
After analyzing the second pilot, the OSTRC revised the instruments and in the spring of 2006 released them for public use as a "Conference Evaluation Toolkit."Out-of-school time conference planning organizations can now utilize the Toolkit free of charge, in exchange for sharing their data with the OSTRC.This additional data contributes to a national databank which informs future revisions to the instruments as well as analyses of OST professional development.
Currently, the OSTRC is implementing a "Non-Conference Workshop Evaluation" pilot study.This effort tests similar instruments utilized within non-conference workshops: those which are provided to out-of-school time staff members but are not associated with conferences.The current study examines comparative measurements of each data domain in various professional development settings, including non-conference workshops, conference workshops, networking meetings, and program observations.These comparisons will produce findings that will allow organizations to compare the characteristics, considerations, and benefits associated with different types of professional development offerings.
Note: The authors would like to thank Dr. Susan Kinnevy and Dr. Stacy Olitsky for their help in developing the OSTRC Conference Evaluation Toolkit.For more information about the Toolkit or other OSTRC research projects, please contact Nancy Peter, OSTRC Director, at (215) 898-0640 or npeter@sp2.upenn.edu.
Figure 4Diagram of Formative and Summative Techniques and Domains of Surveys Collected by Conference