Informing Youth Health and Well-Being Programs : A National 4-H Healthy Living Evaluation Study

4-H Healthy Living programs address healthy eating; physical activity; social-emotional health and wellbeing; alcohol, tobacco, and other drug use prevention; and injury prevention. Using the Systematic Screening and Assessment Method, this paper identified 32 4-H Healthy Living programs across the nation ready for comprehensive outcome evaluation and/or national replication based on 6 criteria. Weaknesses in an additional 78 programs that did not meet the criteria were also identified. Programs that failed to meet the criteria did so primarily because they lacked a clearly delineated theory of change or appropriate evaluation. Implications for practice include ways to strengthen program planning and use of a comprehensive evaluation framework. Specific attention is given to professional development for 4-H professionals.


Introduction
4-H is the largest youth development organization in the United States.In 2016, nearly 6 million youth ages 5 to18 participated in programming, and approximately 600,000 adults and youth served as volunteers (National 4-H Council, 2016).Although health has been an integral part of 4 H since the early 20th century, the national 4-H Healthy Living initiative began in 2008.
4-H Healthy Living programs use a holistic approach that includes eating a healthy diet, engaging in safe physical activity, avoiding risky behavior, recognizing and directing emotions, and developing and maintaining positive social interactions and relationships (National 4 H Healthy Living Task Force, 2009).This holistic approach helps youth and their families increase their awareness, knowledge, skills, and competencies in physical, social, and emotional wellbeing.Thus, 4-H Healthy Living programs address five domains: healthy eating; physical activity; social-emotional health and well-being; alcohol, tobacco, and other drug (ATOD) use prevention; and injury prevention.A guiding principle of 4-H Healthy Living programs is that programs and curricula are based on best practices in healthy living research (National 4-H Healthy Living Task Force, 2009).
Despite the noted success and longevity of 4-H, Arnold (2015) argued that 4-H does not consistently adhere to a practical program model "that articulates the 4-H program theory of change or chain of action to guide program development and implementation" (p.55).Program theory is critical because it can identify steps needed to achieve desired outcomes and result in a blueprint for implementation and evaluation (Rennekamp & Jacobs, n.d.).Kettner, Moroney, and Martin (2017) suggest that a logical approach to programming is more likely to produce desired outcomes, as compared to simply hoping that a program achieves some outcome.
In an effort to build logical connections between 4-H programs and outcomes, National 4-H Council (2010) developed a logic model for its healthy living programs.However, leaders in 4-H were unsure of the degree to which programs adhered to that logic model, as well as whether the programs were achieving intended outcomes.This uncertainty prompted National 4-H Council to fund a project in 2013 to document quality 4-H Healthy Living programs (National 4-H Council, 2012).
Evaluation specialists with Mississippi State University Extension Service (MSU Extension) used the Systematic Screening and Assessment method (SSA) to identify promising 4-H Healthy Living programs that adhered to the National 4-H Healthy Living mission and logic model; met specific, minimal quality standards and criteria; and were deemed ready for comprehensive outcome evaluation and/or national replication.For this project, a program was defined as an organized, purposeful set of activities (National 4-H Council, 2010).Information was also gathered on the quality of those programs that did not meet the minimal standards.The responses as they were submitted and when additional information about a program was needed (e.g., information in the questionnaire was missing or unclear), we conducted interviews with the primary contact.
Simultaneously, we conducted a content analysis of 4-H Healthy Living documents to identify programs not reported through the questionnaire process.Documents reviewed included grantee reports of 4-H Healthy Living projects funded by Walmart, United Healthcare, and Coca Cola; the 4-H Programs of Distinction database; and the 4-H Healthy Living Literature Review (Hill, McGuire, Parker, & Sage, 2009).The reports were reviewed for information similar to that sought through the online questionnaire (e.g., program name, objectives, audience, evaluation).We used the following criteria during this scanning process (Step 2) to assess each program submitted through a national survey and/or identified in a published 4-H Healthy Living programmatic report: Criterion 1: 4-H Healthy Living categories/domain(s) were identified or identifiable.Criterion 2: Target population identified and is in 9 to 19 year age range.
Criterion 3: Specified objectives that are clear, realistic, and measurable.The scanning resulted in an enumeration of criteria unmet and identification of common problems among those programs that failed to meet all six criteria (see Findings section).
Programs that met all criteria were included in the evaluability assessment (Step 3) and we used a checklist for program evaluability (developed by the United Nations Development Fund for Women, 2009) to assess the following evaluability parameters: program design, availability of information, and conduciveness of context for evaluation.Once again, we obtained more information about a program (if needed) through interviews or email communications with a program representative.
In Step 4,

Findings
Through SSA, we identified 110 unduplicated 4-H Healthy Living programs.Figure 1 displays the flow scheme for the SSA.The domains with the largest number of identified programs were physical activity and healthy eating (co-classified) (n = 43), followed by social-emotional health and well-being (n = 15), and injury prevention (n = 14).
Thirty-two of the identified programs showed readiness for further outcome evaluation or had evidence of potential for replicability; the majority were in the physical activity and healthy eating domains (n = 14), followed by social-emotional health and well-being (n = 7).The appendix lists these programs and their references, domain, level of evidence, and URL for additional information.Each of these programs met several indicators within the evaluability parameters (program design, availability of information, and conduciveness of context).For example, responses about the program identified a clearly defined problem, an appropriate target population, a clear and accurate logic model or description of the program's theory of change, clear and measurable objectives, results from at least a pre/post evaluation, and resources to undertake a more rigorous evaluation.Figure 1 also identifies the domains and evidence of replicability classifications for these programs.
Most programs with evidence of replicability had preliminary evidence (n = 25).For example, in the physical activity and healthy eating domain, one program with preliminary evidence had a clearly defined problem (unhealthy eating habits and lack of physical activity by youth), specific target population (elementary and middle school youth), complete logic model, published We classified 78 of the 110 programs as "not ready for replication" because they failed to meet the six criteria (described previously).In numerous cases, multiple criteria were not met by a program.Table 1 documents the unmet screening criteria within each of the five domains.
All programs met criteria 1 and 2 (identifiable domain and target population within the 9 to 19 age range).Across each domain, programs most often lacked program logic or theory (criterion 4).For example, some programs' objectives were health related, but the only outcomes assessed were life skills.Although life skills can be an appropriate outcome, assessing only life skills will not provide evidence as to whether the specific health-related content changed health knowledge or behaviors.In other cases, programs described a desired outcome of increasing the amount of time spent doing physical activity, but opportunities for participants to practice physical activity were not included in the program.

Discussion
The nationwide reach of 4-H and a focus on healthy living provided an excellent place to start with a systematic process of identifying programs that promote the health of youth.The extensiveness of programs also provided an opportunity to identify overall strengths and weaknesses in programming across 4-H for program improvement.Although 32 programs met the criteria for quality programming and evaluation, 78 programs did not.Two primary weaknesses were observed among those programs that were not identified as being ready for replication or more intensive outcome evaluation: lack of fully-delineated program theory and program evaluation approaches that could not adequately assess change resulting from the program.
A lack of logical connections among objectives, planned activities, and/or outcomes was a major weakness, indicating that a theory of change was likely not present.As mentioned, a theory of change can increase the chance that programmatic strategies chosen will accomplish desired outcomes.However, practitioners fail to develop this theory of change for many reasons.For example, in response to pressing needs or interests (e.g., an emerging community crisis or condition such as the opioid epidemic), 4-H campus or county professionals may rush to implementation, rather than thinking through whether program goals and objectives are welldefined and feasible, the change process is plausible, program procedures and activities are clear and adequate, and evaluation methods can document outcomes.The need to be responsive to needs while still addressing program theory has been discussed by Knowlton and Phillips (2013): Ideally, theories of change are grounded in literature, experience, or other evidence that promotes plausibility.If the theory of change is supported by a body of evidence, there is a stronger chance that the strategies chosen will secure the desired results.
Frequently, however, this "standard" is overlooked.In the urgent fever to get to implementation, the design and plan quality can be shortchanged and rely, instead, on faulty assumptions, old practice, or little or no evidence (p.17).
Creating a theory of change gives attention to the connections among programmatic components (Arnold, 2015).Developing a theory of change can occur through delineating a series of if-then relationships.In working out the if-then sequences, connections can be made, assumptions can be clarified, and an understanding of how investments are likely to lead to results is enhanced.Figure 2 gives an example of how one might make these connections.

Figure 2. Example of Developing a Theory of Change through a Series of If-Then Relationships
The second leading weakness was related to program evaluation.As mentioned earlier, programs that were not identified as being ready for replication or more intensive outcome evaluation often used a posttest only or focused exclusively on process or implementation evaluation.At least a pretest/posttest design that collects data from participants before and after a program is needed to estimate the program's effect by comparing data from these two measurement points (Rossi, Lipsey, & Freeman, 2004).Additionally, while process or implementation evaluation are important for helping identify reasons that a program failed to achieve its objectives, it does not provide evidence that the desired outcomes were met.
Ideally, thorough evaluation would include both process or implementation evaluation and outcome evaluation using at least a pretest/posttest design.
The two weaknesses that were observed (i.e., lack of fully-delineated program theory and weak outcome evaluation design) could indicate broader challenges in Extension, and specifically 4-H.If the 5th graders receive the nutrition-related content and complete hands-on activities, then they will learn about nutrition and improve their healthy eating skills.
If they learn about nutrition and improve their skills, then they will eat healthier foods and act in healthier ways.
If they eat in healthier ways, then as teens and adults, they will experience fewer nutritionrelated illnesses.
While we have highlighted these weaknesses in this paper, approximately one-third of the programs reviewed for evaluability were classified as having preliminary, moderate, or strong evidence of readiness for replication at a national level or more rigorous outcome evaluation.In light of these challenges and strengths, implications for practice and recommendations follow.

Implications for Practice
This paper identified 32 4-H Healthy Living programs that are ready for further outcome evaluation or had evidence of potential for replicability on a national level.Implementation of programs with evidence of achieving desired outcomes can contribute to efficient use of programmatic resources.Thus, the next logical approach is to promote their use across 4-H systems or even other youth development organizations.4-H programs are often easily accessible (e.g., low cost, collaborative nature of 4-H professionals).Additionally, those programs identified in this paper already have clear lessons and activities, identified evaluation tools and approaches, and supportive materials for educators.This would make adoption across 4-H, and even by other youth development organizations, attractive.
As an organization, 4-H is dedicated to ensuring that its programs are high quality, with specific objectives, activities, and outcomes that are clear, realistic, measurable, and logically connected, as well as having appropriate evaluation approaches to document outcomes achieved.As indicated by the National 4-H Healthy Living Task Force (2009), 4-H is dedicated to: Increasing the knowledge and commitment of Extension staff to design effective process evaluation strategies that enable newly developed learning experiences and curriculum to be improved, establish an ongoing monitoring process to ensure quality implementation, and create processes to eliminate and redirect resources away from ineffective programs.
Designing effective evaluation strategies that enable 4-H professionals to develop healthy living curriculum to advance to the highest level of evidence possible.
This dedication includes attention to program theory and program evaluation-the two weaknesses reported in the current paper.This desire serves as a great strength and can be the driving force to promote professional development for theory, and research to create a program that will be likely to achieve its desired outcomes.
Without such integration, a program's content and/or activities may be highly innovative, but intended outcomes may be less likely to result (Knowlton & Phillips, 2013).
Without expertise in program evaluation, 4-H professionals would benefit from the use of a comprehensive evaluation framework that delineates components needed to thoroughly assess both a program's process and outcomes.The RE-AIM framework is one option that has been promoted for use in 4-H Healthy Living programs (National 4-H Healthy Living Task Force, 2009).RE-AIM (re-aim.org)identifies five evaluation dimensions (Reach, Effectiveness, Adoption, Implementation, and Maintenance) and has been successfully used to inform the selection of evidence-based health promotion programs (Glasgow, Vogt, & Boles, 1999).RE-AIM is a straightforward and consistent approach that could be beneficial in several ways to (a) assess a 4-H program's health outcomes, (b) compare program processes and health outcomes across multiple sites or over time, and (c) inform decisions about resource distribution for effective programs.
Given these benefits, Downey, Peterson, Donaldson, and Hardman (2017)  Additionally, the environmental scan survey was shared with other groups interested in using the process in other health-related areas such as chronic disease management programs.
Finally, we collaborated with Extension colleagues at the University of Tennessee on a Rural Health and Safety Education grant to implement one of the programs identified through this SSA project.Such work will continue as opportunities arise.

Conclusions
The project reported here provided a snapshot of 4-H Healthy Living programs across the nation at one point in time and led to the identification of 32 promising health-related positive youth development programs.The review process allowed for the identification of programmatic weaknesses and resulted in recommendations for program improvements.
Weaknesses identified in programs provide direction for professional development.This inventory is primarily useful to 4-H as the need for health-related youth programs increases and the pressure to implement programs that have demonstrated positive health outcomes mounts.
However, these programs could be replicated by any positive youth development organization.
Additionally, other organizations could use the SSA method to conduct an environmental scan and evaluability assessment of their own program offerings.The promising 32 programs have the potential to foster a healthy lifestyle that influences immediate and long-term health outcomes.

Criterion 4 :
Objectives, activities, and outcomes are logically connected.Criterion 5: Clearly specified, measureable, and realistic outcomes that are tied to the 4 H Healthy Living logic model outcomes.Criterion 6: At least a pretest/posttest is used to assess outcomes, and evaluation results are available and/or reported.

Figure 1 .
Figure 1.Overall Flow Scheme for the Systematic Screening and Assessment Programs also often failed to use at least a pretest/posttest design (or retrospective pretest design) to evaluate program outcomes (criterion 6).Most programs that failed to meet criterion 6 used a posttest only or focused efforts on evaluating implementation of program activities.To document change in an outcome, at least two measurement points are needed.For example, if a program only administered a posttest, it was difficult to know if youth increased their healthrelated content knowledge because there was not a baseline assessment of where their healthrelated knowledge started.Less frequently, program objectives and/or program outcomes were not clear, realistic, and measurable (criteria3 and 5).Some program objectives were written as process objectives (what program staff would do) instead of outcome objectives (what participants would know or do as a result of the program).In one case, a program had the stated objective of "teach youth about healthy eating."In other cases, program outcomes were not realistic; for example, a program of only 3 hours duration indicated a desired outcome of a change in body mass index, which cannot occur in that short time frame.
we classified programs as having preliminary, moderate, or strong evidence of replicability as defined by the Corporation for National and Community Service (CNCS, 2012):

Table 1 . Programs with Unmet Screening Criteria within Each 4-H Healthy Living Program Domain Healthy eating Physical activity ATOD prevention Social- emotional health and well-being Injury prevention
Note.Some programs failed to meet multiple criteria, so a single program may contribute to counts in more than one cell in Table1.For example, a program classified in the combined healthy eating and physical activity domain may have failed to specify clear, realistic, and measurable objectives (criterion 3) with respect to physical activity, and may also have failed to logically connect objectives (criterion 4) with respect to both healthy eating and physical activity, thus that one program contributes to the counts of 7, 22 and 19 in those cells in Table1.
4-H educators and specialists.As the following examples will suggest, such professional development can help ensure that resources dedicated to program development and implementation are not wasted due to a lack of program theory or weak program evaluation.While Extension professionals may have strengths and expertise in certain content areas, working with youth, and/or communication, they may not have strengths or expertise in program development or program evaluation.Individuals who lack formal training in program development may be unable to clearly delineate a program's theory of change.However, even without expertise in those areas, beginning with a series of if-then statements to describe the components required to move from a program need to a desired outcome can be an initial step in developing a program's theory of change.Professional development related to theory of change models can be provided to help them learn how to integrate practice, experience, Maintenance) were conducted, with measurement of the effectiveness and maintenance components requiring data collection at multiple time points to show true change.Identification of these two main weakness and suggestions for addressing them are of great importance to administrators, educators, specialists, and volunteers in 4-H.Individuals who focus on professional development and capacity building can use this information to enhance the quality of the programs offered.Professional development could take the form of recorded online presentations, interactive webinars, face-to-face trainings, one-on-one technical assistance, or even printed materials, based on the resources available in the state 4-H system.Administrator support and encouragement for such training and technical assistance on program theory of change and/or RE-AIM (or another evaluation framework) would show that such knowledge and its use are valued.Additionally, administrators could use evaluation results from a consistent framework to compare programs in order to inform decisions about which programs to continue, expand, modify, or eliminate.The MSU Extension evaluation specialists who led this project have begun such professional development efforts at a national level with different target audiences.For example, a competency building workshop for 4-H agents was conducted at the 2014 Annual Conference of the National Association of Extension 4-H Agents in Minneapolis, Minnesota.Similar workshops with Extension professionals, including those professionals who work primarily in program and staff development, were delivered at the 2013 Annual Conference of the American Evaluation Association in Washington, DC; at the 2016 Annual Conference of the National Association of Extension Program and Staff Development Professionals in Ridgedale, Missouri; and during the 2015 Evaluation Virtual Summer School hosted by the National Association of Extension Program and Staff Development Professionals and the Extension Southern Region Program Leadership Network.Journal articles have been published and are cited throughout this paper.
program evaluation were included, and thus helped avoid the weaknesses related to evaluation identified through the evaluability assessment reported here.For example, both process evaluation (Reach, Adoption, and Implementation) and outcome evaluation (Effectiveness and