Beyond the Trifold in Civics Presentations: The Measure of Youth Policy Arguments

Youth are increasingly engaging in civic action to address social injustices. Many adult educators are also looking for instructional resources that support youth voice as a way to promote adolescent civic development and community change. Alas, assessment tools to support youth voice and policy argumentation are lacking. Existing tools overemphasize public speaking skills and rely on dated artifacts such as cardboard trifold posters. In this article we introduce the Measure of Youth Policy Arguments (MYPA), a tool designed to aid in the development and assessment of high-quality youth policy presentations. We also describe how to use the MYPA in formative and summative contexts. Additionally, we provide initial evidence for the validity and reliability of the MYPA. Furthermore, we argue that MYPA has applications in preparing youth for policy presentation and in assessing learning outcomes associated with youth voice projects

assessment tool.In our own experience, the summative potential of MYPA has opened opportunities for partnerships with in-school and out-of-school organizations, ultimately leading to the MYPA being used as both a formative and summative tool in these organizations.
In this article we describe the rise of youth voice and the need for authentic formative and summative civics presentation tools.We then discuss the development, intended uses, and validation of the MYPA.Finally, we share potential applications of the MYPA in preparing youth for policy presentation and in assessing learning outcomes associated with youth voice projects.
This article advances the youth development literature base by describing a tool that will aid youth and their adult partners in preparing for and authentically assessing the quality of youth policy presentations.

The Rise of Youth Voice
Internationally and nationally, youth activists have organized large-scale demonstrations, including marches and speeches.Greta Thunberg, perhaps the most visible youth and climate activist, was named Time magazine's youngest ever person of the year in 2019 (Felsenthal, 2019).Thunberg's address to world leaders at a climate action summit in New York in September 2019, was a master class in impassioned speech.Nationally, youth from Parkland, Florida organized the March for Our Lives in response to a shooting at their high school and lax national gun laws (Witt, 2018).The Parkland youth have been innovators: Emma Gonzalez's moment of silence at the March 24, 2018, March for Our Lives is a powerful example of this (NBC News, 2018).More locally, youth from Denver took to social media to critique district leadership and garner support for striking teachers.Though the Denver youth faced backlash on social media platforms and from district personnel, they succeeded in mobilizing their peers and increasing public awareness.None of these youth were confined to the typical tools of civics courses-a trifold, prompt, and 3 minutes to present-and they seized the opportunity.
These forms of voice and activism did not emerge from education spaces, but if we embrace the idea that a strong and just democracy requires education, then we want to think about the kinds of curriculum and assessments that support youth's civic development.
With regard to school-based youth agency and activism, the most promising approach is action civics.In action civics, youth "do civics and behave as citizens by engaging in a cycle of research, action, and reflection about problems they care about personally while learning about deeper principles of effective . . .political action " (p. 224, Levinson, 2012) roots in the educational theories of John Dewey but has recently received renewed attention as educators have sought approaches to civic education that bring democratic practice to life through real-world project-based learning, discussion of controversial issues, and sharing of policy ideas with public audiences (Blevins et al., 2016;Gould et al., 2011).Evidencing the impact of action civics curriculum, Andolina and Conklin (2018) found that youth who partook in a weeklong action civics curriculum, where they chose topics relevant to their daily lives and interacted with peers and community members to share their stories, demonstrated higher levels of civic and rhetorical skill.Cabrera et al. (2014), using a logistic regression analysis, found a significant direct relationship between youth participation in an ethnic-studies action civics program and graduation rates and standardized test scores.
The out-of-school time (OST) field, too, has seen a resurgence of programs that facilitate youths' opportunities to critically reflect on their social world and develop projects that make political demands of elected leaders.OST programs use different models, including youth organizing, YPAR, and youth-adult partnerships (Torre & Fine, 2006;Wu et al., 2016).Many OST programs share with action civics an emphasis on opportunities for young people to speak up about issues they care about but vary in the extent to which they prioritize research, critical social analysis, or political action (Akiva & Petrokubi, 2016;Conner & Rosen, 2016).Despite this renewed interest in civics programing, a challenge remains in how to assess youth learning, especially as related to civics policy presentations.

The Challenge of Assessing Youth Policy Presentations
Although located in different institutional contexts, with different pressures related to accountability, schools and OST programs both rely on assessment systems to evaluate youth learning, improve programs, and report to external stakeholders.In this sense, youth voice programs share a common interest in the availability of tools for assessing youths' capacity to construct and deliver high-quality policy arguments.Alas, existing resources are inadequate.
In the school context, assessments of action civics are limited in two ways.The first pertains to a mismatch between typical federal-and state-mandated assessment systems, which are standardized and based on bodies of pre-established knowledge (codified, for example, in the Common Core State Standards), and the kinds of open-ended inquiry characteristic of action civics.Action civics projects are not designed to teach information that can be found in textbooks (Levinson, 2012); correspondingly, tests based on textbooks will not be sensitive to the kinds of learning youth can achieve.Second are the limitations of existing rubrics for summative and formative assessment of action civics.In a review, conducted by the authors, of 13 youth presentation rubrics from civics organizations and school districts across the country, seven closely resemble public-speaking or debate rubrics, including items to assess voice projection and posture.One instrument dedicated 18 of 26 questions to the design of presentation slides.Those rubrics that did assess the content of the presentation tended to utilize checklists ("was it there or not?"), with little or no reflection of differences in quality.Only four of the rubrics provided a Likert-type scale to assess quality, alas none included a description for individual score categories-leading to a lack of differentiation of what makes a score a 3 versus a 2 or 4. None of these tools provided information on their reliability and validity, leaving in question whether these rubrics can produce accurate and consistent scoresalso posing a limitation in their use in research and assessment of youth learning.Given the limitations in existing rubrics, it appears that youth are not receiving constructive feedback on their policy presentations and educators are limited in their ability to assess the quality of youth learning.
With regard to OST programs, although they have more flexibility about how they assess youth learning, they often face pressure to provide outcome data.One relatively simple and inexpensive way to do this is by administering self-report survey instruments designed to measure civic development constructs, such as civic agency or intentions to vote in the future (Flanagan et al. 2007).However, self-report surveys have a number of known limitations that make them less than ideal for the purpose of estimating youth learning (see Howard, 1980).
Such instruments are subject to a range of construct-irrelevant influences, the most serious of which are social desirability bias (Krumpal, 2013) and idiosyncrasies in how individuals respond to survey items in general (see Messick, 1991).As a practical matter, scores on these instruments often have a high floor at pretest, limiting their capacity to detect change.Further compounding matters, traditional methods of validation of self-report instruments may be largely incapable of detecting serious flaws in their design (Maul, 2017).Although assessment should never dictate the boundaries or imagination of youth voice projects, it can be helpful in giving feedback to youth or reporting learning to educators and stakeholders.
For youth voice programming the concept of "authentic assessment" offers a useful framework as it mirrors mature civics practices.According to Wiggins (1990), authentic assessment aims to evaluate youth performances or products as they "rehearse the complex ambiguities of the 'game' of adult and professional life" (p. 1).What is more, when youth master a real-world skill retain what they have learned and apply it to other contexts than when they merely try to memorize facts or the steps of a skill for a school-based test" (p.69).Mastery is key, but, in the case of youth voice presentations, it is complicated to assess.We developed the MYPA to be an authentic assessment of youths' ability to perform policy arguments; the MYPA provides a more complete assessment picture by examining youths' problem identification, research, policy development, and presentation abilities across a range of topics (Kirshner et al., 2020).

The Measure of Youth Policy Arguments
The MYPA is an observation protocol designed to assess the quality of youth policy presentations to external stakeholders, which often are the culminating activity for action civics and YPAR projects.We were motivated to develop the MYPA in part from our review of youth policy presentations, where we found that youth presentations did not address the root causes of their focal problem, employed questionable or misaligned research methods, proposed incomplete or off-base policy solutions, and did not take the opportunity to enlist the help of their audience in enacting social change.Additionally, the MYPA has potential applications as a summative assessment for adults looking to evaluate youth presentations and programs hoping to assess learning outcomes.

Intended Uses
The MYPA is intended to be used by educators in youth voice programs in and out of schools.
The tool was designed to be used primarily for formative purposes, in order to help middle and high school youth prepare high-quality policy arguments.It can be used as a guide to help youth construct a policy presentation or to provide youth with feedback as they practice said presentation.As a summative assessment tool, the MYPA might be used to examine youth learning outcomes either at the classroom or program level.At the classroom level, the MYPA can be used for grading group policy arguments.At the program level, the MYPA might be used to assess youths' critical thinking skills or ability to engage in civic action; this may be useful for programs looking to provide data on youths' learning for administrators or funders.Finally, the MYPA might be used for scoring youth voice and action civics competitions, where judges can use it to compare the quality of teams' presentations.
There are certain conditions where the use of the MYPA is not recommended: It was not designed for use in high-stakes testing.We argue that the goals of high-stakes testing are incompatible with the MYPA.The MYPA is an authentic assessment of youths' ability to engage in policy argumentation, not a generalizable assessment of knowledge.
Moreover, we caution potential users against the schoolification of the MYPA--that is, by treating it as a checklist for a grade rather than to support quality social action (Rubin, et al., 2017).Classrooms often focus on products like tests, essays, and speeches; in these contexts, youth voice can become an exercise instead of a tool to enact change.Recently, for example, we observed a civics education day where young folks used a trifold to identify issues, root causes, and tactics for change in one of four assigned topic areas.Though this structure provided a helpful one-stop visual that made it easy to locate information across presentations, this generic product was devoid of passion and persuasion.Any assessment tool, the MYPA included, risks becoming empty of meaning when schoolified; our aim is for the MYPA to be a tool that supports youth passion and engagement in leadership and social change.

Scoring
The MYPA aims to assess the quality of youth presentations organized according to six constructs: presentation and delivery, problem identification, research methods, policy proposal, collaboration, and responsiveness to questions.The MYPA consists of 25 total items.Two openended items ask the rater to identify the focal problem and proposed policy discussed in the youths' presentation; these are intended to guide the rater in evaluating other elements of the presentation and are not scored.The other 23 items require the evaluator to rate presentation quality on specific criteria (e.g., relevance to speakers or extent to which evidence is convincing), and have two, three or four scoring categories (depending on the item) ranging from lowest to highest quality.
When used for summative purposes, the scores on individual items can be weighted to prioritize higher-order constructs, such as critical analysis and quality of response to questions.These items require youth to demonstrate critical thinking skills.This is particularly true of responses to questions, where youth cannot rely on a script, but must demonstrate their knowledge and potentially defend against counter arguments.A copy of the MYPA can be found at https://transformativestudentvoice.net/curriculum/.
It should be noted that there is a minimal training required of raters using the MYPA.The MYPA scorecard is organized so that raters can work through categories of quality from left to right.This allows the rater to understand all that should be in a presentation for it to receive the strongest possible score.Additionally, the constructs are ordered in the manner that they are most likely to appear in a presentation.Raters are also asked to attend to the language of the rating categories: in particular, youth should receive ratings based on what was actually demonstrated and verbalized, not what could be inferred.Furthermore, a rater should be attentive to their biases-in piloting we found that raters often scored youth higher on topics that spoke to the rater's interests.As such, raters should be attentive to the presentation itself and provide a score based on what youth discuss.What is more, we encourage raters to be critical of youth presentations.We have found that youth appreciate receiving critical feedback as they are looking to improve their argumentation skills.Finally, we encourage adaptations of the MYPA based on the reader's specific needs.The reader might consider alternative scoring structures or the addition of items that speak to specific youth-learning outcomes.For example, we developed an alternate version of the MYPA with more items related to youths' responses to questions that might be appropriate for situations in which more time is provided for Q&A with adult decision makers or judges.

Development of the MYPA
In the development of the MYPA we drew on construct-centered assessment (Wilson, 2005) and research-practice partnerships (RPP; Coburn & Penuel, 2016).We began with the specification of constructs, each with different levels of quality, represented in construct maps.
The initial version of constructs were informed by a thorough review of scholarly literature on policy arguments and a review of youth policy presentations.For this latter process we invited seven research team members, eight local educators, two community organizers, and three local politicians to view videos of youth policy arguments in order to identify potential constructs.We also utilized the principles of co-design (Penuel et al., 2007) with experienced educators and frontline users and youth, with iterative cycles of development, piloting, and feedback.Feedback during this time led to several iterations of changes to item language and efforts to shorten the protocol to make it more user friendly.Changes were also made to clarify the gradations of quality in calls to action and how youth drew on evidence.For more detail on how the MYPA was developed in the context of RPP see Kirshner et al. (2020).

Evidence of Reliability
A first piece of evidence in support of the validity argument (see Bell et al., 2012) for the adequacy and appropriateness of the MYPA for its intended uses is that trained raters exhibit sufficient convergence in their ratings so as to be confident that at least the significant majority of variance in scores is not due to the idiosyncrasies of the individual rater-that is, that interrater reliability (IRR) is sufficiently high.To help establish this, researchers attended and scored youth policy presentations at two action civics events.The pseudonyms of Western Contest and the Midwestern Capstone will be used to maintain the anonymity of these events and their organizers.The Western Contest is organized by the Student Civic Leadership (SCL) program that is housed within the school district of an urban city in the western United States.
Eighteen high school teams participated in the Contest, which was held in May and reflects work completed throughout the academic year.During the Contest high-school-aged youth describe a community or local issue, research they undertook to better understand the issue, and a proposed policy meant to address the problem.Each team received 5 minutes to present and 3 minutes to answer questions from a panel of judges.At this event two raters scored four youth presentations (these ratings were distinct from the scores of the judges at this event).
The Midwestern Capstone is a year-end showcase of YPAR.The Capstone is organized by a collaborative that includes a state university, a local education agency, and multiple local school districts of an urban Midwestern city.A total of 15 high schools participated in this collaborative and 16 teams (one school had two youth teams) participated in the Capstone.During the Capstone youth had 5 minutes to share the results of their YPAR projects with school and community leaders.Our involvement with the Capstone was as a result of their project manager reaching out to us with an interest in using the MYPA to assess the quality of their youth's presentations and gain insights on learning outcomes.We came to a mutual agreement that allowed our researchers to attend the event, so we might evaluate IRR using the MYPA, and provide the collaborative with feedback on their youths' presentations.Two raters scored all 16 presentations.This study was approved by the institutional review board of the home institutions of the first, second, and third authors.

Raters
Two raters viewed and scored all presentations associated with this study.Both raters were doctoral students and graduate research assistants associated with this research project.Rater 1 is a White male and a former teacher with experience leading action civics projects.Rater 2 is an African American female, a former school administrator, and executive director of a nonprofit author, though at separate points in time.As part of norming, the first author and the raters discussed the constructs associated with the MYPA and viewed multiple youth policy presentation videos from a collection of prior youth voice events.First, they would view a video and collectively score on the MYPA, allowing time to discuss why certain ratings were most appropriate.Eventually, they would watch and score videos separately and later come together to compare ratings and discuss disparate scores.Norming continued until the first author and the rater could achieve an overall 80% agreement on the MYPA.These norming activities took approximately 3 hours.The two raters did not participate in norming activities with each other.

Procedure
The raters attended both the Western Contest and Midwestern Capstone, at which they watched presentations live and scored in the moment.This required the raters to score during the presentation and to use the limited transition times to complete their assessment, which approximates typical conditions for raters at a youth voice event.Raters scored presentations independently using an online version of the MYPA.Given the rapid pace of the environment and the fact that scores were stored online, the raters did not debrief or otherwise discuss their scores during these events.Microsoft Excel was used to assess the percentage of agreement between raters.

Results
Percentage agreement was used as a measure of IRR, as it provided a method to assess by item, construct, and the MYPA overall.Percent agreement examines the number of instances upon which both raters provided identical ratings divided by the total number of ratings (McHugh, 2012).In the case of the MYPA, the overall percent agreement for a single observation would be the number of times the raters agreed on their assessment divided by 23-for the 23 scored items on the MYPA.At the Western Contest, IRR ranged from 73.91% and 78.26%, with an average percent agreement of 75% across all MYPA items and all four presentations.For the Midwestern Capstone, IRR ranged from 69.57% and 95.65% with an average agreement of 83.24% across all MYPA items and all 16 presentations.Across all 20 presentations the raters achieved an average IRR of 80.87%.Across all presentations, raters had the highest percentage of agreement on the constructs of Response to Questions (100%), Collaboration (86.67%), and Presentations and Delivery Research Methods (78.75%),Policy Proposal (76%) and Problem identification (75%).Lower IRR on these three constructs might be related to their associated items having three or four categories of quality, requiring greater attention to differentiation, and these constructs containing two of the three items that are more difficult to score.In terms of individual items, the raters had perfect agreement (100%) on items 2, 22, and 25.The raters had a percentage of agreement of 80% or higher on 15 items and a percentage of agreement of 90% on 10 items.Raters had the lowest percentage of agreement on items 11 (60%), 4 (55%), and 15 (50%).

Evidence for Use
In order to argue for the validity of the MYPA for its intended purposes, we draw on data from user feedback, program implementation, and a mini-case study of a young person trained in policy argumentation using the MYPA.Our hope in providing this preliminary evidence is to show that the MYPA produces appropriate results in authentic action civics and YPAR contexts.

Raters
As part of the judging process for the Western Contest a group of eight judges used the MYPA to assess the quality of youth policy presentations.Prior to the contest, the judges received a 2hour training on how to use the MYPA.These judges were educators, community activists, and elected officials.At the conclusion of the contest, we solicited feedback from the judges, specifically asking if their scores on the MYPA aligned with their perception of presentation quality.As one team member facilitated the conversation, a second team member took notes of the judges' comments.Generally, the judges expressed that the MYPA accurately captured their assessment of a presentation's quality and that the MYPA items represented elements of highquality policy presentations.

Programs
Following their experience with the MYPA, both programs described previously adopted the MYPA for formative and summative purposes.The SCL adopted the MYPA for scoring youth presentations in the Contest and redeveloped their curriculum to align with MYPA constructs, in a form of backwards mapping.Similarly, the collaborative that organizes the Midwestern Showcase adopted the MYPA, but for the purposes of assessing youth learning outcomes.The collective used the MYPA to assess if school teams reached desired competencies related to attend to areas where teams scored lower on the MYPA.Based on this initial evidence, we contend that MYPA might be appropriate both for assessing the quality of youth presentations and for assessing learning outcomes associated with action civics and YPAR programs.

Mini-Case Study
In addition to evidence that judges felt scores on the MYPA aligned with the quality of youth presentations and that programs adopted the MYPA to assess youth learning, we share a third type of evidence from our ongoing research with SCL.Specifically, this case study presents how youth versed in the MYPA applied their learning to an authentic issue unrelated to any school class or program: defunding the police in their school district.
The political context of the Black Lives Matter movement during the Spring of 2020 provided a stage for youth to demonstrate their civic analysis and action skills, especially in policy argumentation.One young person in particular, Nina (who was a youth leader in SCL and obtained training using the MYPA for policy argumentation), voluntarily took on a leadership role in advocating for the removal of police officers from local schools.In the late spring of 2020, on the steps of a large local high school, Nina delivered a powerful speech demanding that the school district end its contract with the police department.Her delivery was closely aligned with the MYPA, as she shared a clear message of "counselors not cops" and supported her argument with powerful evidence, emotional lived experience, and shrewd critiques of unjust social systems.She ended with a clear call to action for the school district and the city as a whole: "Defund the police and finally start to put actions behind your words.Students of color deserve more than broken promises, handcuffs, and juvenile records."Though Nina entered the SCL program with high levels of interest in issues of equity and justice, her ability to organize an argument and her belief in herself to publicly deliver it grew strongly through her experiences with the MYPA.

Implications
Based on these preliminary findings, we submit that the MYPA advances the youth development field by supporting the planning, delivery, and assessment of authentic and dynamic youth voice presentations.What is more, the MYPA foregrounds higher-order constructs not present in existing presentation rubrics, better equipping youth to develop high quality policy presentations grounded in logic, personal testimony, and research in ways that engage and persuade adult decisionmakers.