### ENDNOTES

^{1} All students’ names have been
changed throughout.

^{2} McFarland, J., Hussar, B., Wang, X., Zhang, J., Wang, K., Rathbun, A., Barmer, A., Forrest Cataldi, E., & Bullock Mann, F. (2018). *The
condition of education 2018 *(NCES
2018-144). U.S. Department
of Education. Washington, DC:
National Center for Education
Statistics. Retrieved from https://
nces.ed.gov/pubsearch/pubsinfo.
asp?pubid=2018144

^{3} Latinx is a gender-neutral term
that refers to individuals of Latin
American origin or descent.

^{4} College remediation rates based
on students starting at a four-year
college. These rates are 60% among
all students, 78% among Black
students and 75% among Latinx
students beginning at a two-year
college. See Table 2 in Chen, X.
(2016). *Remedial coursetaking at
US public 2-and 4-year institutions:
Scope, experiences, and outcomes.
*Statistical Analysis Report (NCES
2016-405). Washington, DC:
National Center for Education
Statistics. Retrieved from https://
nces.ed.gov/pubs2016/2016405.pdf

^{5} Barry, M.N. & Dannenberg, M.
(2016). *Out of pocket: The high cost
of inadequate high schools and
high school student achievement
on college affordability.
*Washington, DC: Education Reform Now and Education Post. Retrieved from https://
edreformnow.org/policy-briefs/
out-of-pocket-the-high-cost-of-
inadequate-high-schools-and-high-
school-student-achievement-on-
college-affordability/

^{6} Achieve. (2015). *Rising to the challenge: Are high school
graduates prepared for college and work? *Washington, DC: Achieve.
Retrieved from https://www.
achieve.org/rising-challenge

^{7} Drake, G., Pomerance, L.,
Rickenbrode, R., & Walsh, K. (2018).
*Teacher prep review*. Washington,
DC: National Council on Teacher
Quality. Retrieved from https://
www.nctq.org/publications/2018-
Teacher-Prep-Review

^{8} TNTP. (2015). *The Mirage:
Confronting the hard truth about
our quest for teacher development.
*Brooklyn, NY: TNTP. Retrieved from
https://tntp.org/publications/...
the-mirage-confronting-the-truth-
about-our-quest-for-teacher-
development

^{9} Herold, B. & Molnar, M. (2014).
Research Questions Common-Core
Claims by Publishers. *Education
Week *(March 3, 2014).

^{10} 1,200 hours every year is based
on a typical school year of 180 days,
with 6.64 hours per school day. See National Center for Education
Statistics. (2008). *Number of hours
in the school day and average
number of days in the school year for public schools, by state:
2007–08*. Schools and Staffing
Survey (SASS). U.S. Department of
Education Washington, DC: National
Center for Education Statistics.
Retrieved from https://nces.ed.gov/surveys/sass/tables_list.asp. See
also McFarland, J., Hussar, B., Wang,
X., Zhang, J., Wang, K., Rathbun, A., Barmer, A., Forrest Cataldi, E., & Bullock Mann, F. (2018). T*he
condition of education 2018 *(NCES
2018-144). U.S. Department of Education. Washington, DC:
National Center for Education
Statistics. Retrieved from https://
nces.ed.gov/pubsearch/pubsinfo.
asp?pubid=2018144

^{11} The adjusted cohort graduation
rate in 2015-2016 was 84%, the
highest it’s ever been. McFarland,
J., Hussar, B., Wang, X., Zhang, J.,
Wang, K., Rathbun, A., Barmer, A., Forrest Cataldi, E., & Bullock
Mann, F. (2018). *The condition
of education 2018 *(NCES
2018-144). U.S. Department of Education. Washington, DC:
National Center for Education
Statistics. Retrieved from https://
nces.ed.gov/pubsearch/pubsinfo.
asp?pubid=2018144

^{12} National Center for Education
Statistics. (2018). *Digest of
education statistics, 2016
*(NCES 2017-094), Chapter 3.
U.S. Department of Education.
Washington, DC: National
Center for Education Statistics.
Retrieved from https://nces.
ed.gov/pubsearch/pubsinfo.
asp?pubid=2017094

^{13} NAEP defines four categories of
trajectories based on the type and
amount of credits earned in each
core subject: rigorous, mid-level,
standard, and below-standard.
Across all our partner systems,
17% of students were in a rigorous
trajectory, 45% were in a mid-level
trajectory, 20% were in a standard
trajectory and 17% were in a
below-standard trajectory. See
the Technical Appendix for more
details about how we applied these
definitions to our participating
districts’ data. For the NAEP study,
see: Nord, C., Roey, S., Perkins, R.,
Lyons, M., Lemanski, N., Brown,
J., & Schuknecht, J. (2011). *The
nation’s report card: America’s high school graduates *(NCES
2011-462). U.S. Department of
Education. Washington, DC: National
Center for Education Statistics.
Retrieved from https://nces.
ed.gov/nationsreportcard/pdf/
studies/2011462.pdf

^{14} We estimated the amount of learning in a classroom by
comparing its students’ actual state
standardized test scores to the
state standardized test scores that
were expected of them given how
they had scored historically, as well
as other characteristics like their
race/ethnicity and family income.
See the Technical Appendix for
more details about this approach
and additional analysis results.
Throughout the report, we make the commonly used assumption
that a difference of 0.25 standard
deviations represents 9 months of
learning. See Kane, T. J., & Staiger, D.
O. (2012). *Gathering feedback for
teaching: Combining high-quality
observations with student surveys
and achievement gains*. Research
Paper. MET Project. Seattle, WA:
Bill & Melinda Gates Foundation.
Retrieved from https://eric.ed.gov/?id=ED5409...

^{15} We defined classrooms where
students started the year behind as
those classrooms where students’
average state standardized test
score in the previous school year
was at least 0.5 standard deviations
(or 18 months) below the average
score among all students in the
state. For each key resource, we
split this subset of classrooms in
half so that one group represented
the 50% of these classrooms with
the highest-rated
assignments, lessons, engagement,
or expectations, and the other
represented the 50% of classes
with the lowest scores on these
resources. See the Technical
Appendix for more details about
this approach.

^{16} Because only grade 3-12 students
completed student surveys, these
percentages exclude K-2 students.

^{17} Only classrooms that had a
minimum number of submitted
assignments, observed lessons, or
student surveys were included. See
the Technical Appendix for how we
set these minima.

^{18} Assuming a single class contains 180 instructional hours in a school year, the average ELA,
math, science, and social studies
classroom in our study spent,
respectively, 122, 127, 164, and
166 hours on assignments that were
not appropriate for the grade. See
the Technical Appendix for details
on how we estimated the amount of class time spent with grade-
appropriate assignments, with
strong instruction, or engaged.

^{19} Though the amount of time in
school varies state to state, in all
analyses, we assume a single class
requires 180 hours in a school year, or 9 months. See National
Center for Education Statistics.
(2008). *Number of hours in the
school day and average number of days in the school year for public schools, by state: 2007–08*.
Schools and Staffing Survey (SASS).
U.S. Department of Education.
Washington, DC: National Center
for Education Statistics. Retrieved
from https://nces.ed.gov/surveys/
sass/tables_list.asp

^{20} See the Technical Appendix for
more details on how we reviewed
and analyzed districts’ curricular
and assessment policies, as well
as how we rated materials and
assessments themselves.

^{21} Our definition of strong
instruction did not require teachers
to earn perfect scores on the four
domains we observed—classroom
culture, content, instructional
practices, and student ownership—
so it was possible for a lesson to be
classified as “strong” but not have
high ratings on every domain. In
this example, many lessons (295)
had the highest possible ratings on content, but lower ratings on
instructional practices and student
ownership. See the Technical
Appendix for how we defined
“strong instruction.”

^{22} Scherer, M. (2008). Learning:
Who’s job is it? *Educational
Leadership, 66*(3), p.7. Retrieved
from http://www.ascd.org/
publications/educational-
leadership/nov08/vol66/num03/
Learning@-Whose-Job-Is-
It%C2%A2.aspx.

^{23} The interest, enjoyment, and
concentration approach to
measuring engagement is based on Shernoff, D., Csikszentmihalyi,
M., Schneider, B., & Shernoff. E.
(2003). Student engagement in
high school classrooms from the
perspective of flow theory. *School
Psychological Quarterly*, 18(2), pp.
158–176. https://doi.org/10.1521/scpq.1...

^{24} The survey questions we used to represent worth were partly
adapted from Uekawa, K., Borman,
K. & Lee, R. (2007). Student
engagement in U.S. urban high
school mathematics and science
classrooms: Findings on social
organization, race, and ethnicity.
*The Urban Review*, 39(1), pp. 1–43.
https://doi.org/10.1007/s11256-
006-0039-1. See the Technical
Appendix for more details on how we used survey questions to
categorize students’ perceptions of
engagement and worth.

^{25} Romero, C. (2015). *What we know
about belonging from scientific
research*. Palo Alto, CA: Mindset
Scholars Network. Retrieved from
http://mindsetscholarsnetwork.
org/wp-content/uploads/2015/09/
What-We-Know-About-Belonging.pdf

^{26} “Rarely” defined as having no
more than one experience perceived
as engaging and worthwhile. 28% never had an engaging and
worthwhile experience and 13%
rarely did (N = 2,427 students).

^{27} Rate based on 2012 “event
dropouts,” which represent the
“percentage of high school students
who left high school between the
beginning of one school year and
the beginning of the next without
earning a high school diploma or an
alternative credential (e.g., a GED).”
See Table 1 in Stark, P., & Noel, A.M. (2015). *Trends in high school
dropout and completion rates in the
United States: 1972–2012 *(NCES
2015-015). U.S. Department of
Education. Washington, DC: National
Center for Education Statistics.
Retrieved from https://nces.ed.gov/pubs2015/2...

^{28} On the other hand, the same
student tended to be less engaged
on days when they received
higher-quality assignments. See
the Technical Appendix for results
on how engagement varied on
different days based on the quality
of assignments and lessons.

^{29} Teachers’ perceptions of the
extent to which they talked to
students about their interests and
goals based on four survey items:
To what extent do you engage in the
following practices: (1) meet with
students to discuss their learning
progress; (2) meet with students to discuss their strengths and
interests; (3) set learning goals with students; (4) communicate
with individual students and their
families about the aspirations they
have for a student’s future. Teachers
had four choices for each item:
Never, Sometimes, Often, Daily or
Almost Daily. We integer-coded
teachers’ responses so that Never
was a 1 and Daily or Almost Daily
was a 4, took the average across
all four items, and then classified
this composite value into quartiles.
Classrooms in the top quartile had
an average engagement rate of 62%
while classrooms in the bottom
quartile had an average engagement
rate of 46%.

^{30} All analyses using students’ letter
grades were based on all grade 3-12
students in the participating district
or participating CMO school, not
just the subset of classrooms we visited. See the Technical
Appendix for more details on how
we used data on course grades for
all students in the participating
districts.

^{31} We ran a series of linear
regression models predicting the
typical quality of assignments and
lessons provided to classrooms
based on their demographic
characteristics as well as a host of other controls, including prior
achievement. Notably, there was
still a statistically significant
negative relationship between the percent of students from low-
income families in a class and the
average quality of assignments,
even after controlling for prior
achievement (p<0.01). See the
Technical Appendix for our model
specifications and Table A.13 in the
Appendix for full model results.

^{32} For classes where at least 50%
of the students were students of
color, the typical percent of time
spent with grade-appropriate
assignments, with strong lessons,
and engaged were respectively
23%, 9%, and 50%, while for
classes with mostly white students,
these values were 34%, 33%, and
62%. For classes where at least
75% of students were from low-
income families, these values were
respectively 20%, 8%, and 52%,
compared to 44%, 41%, and 63%
for classes where at least 75% of
students were not from low-income
families. Only classrooms that
contained enough data to meet our inclusion rules were included;
see the Technical Appendix for
more details on these rules and
further analysis comparing access
to these key resources by student
characteristics.

^{33} Some of the racial/ethnic
disparities in test outcomes
between students is likely due to
“stereotype threat.” Stereotype
threat is an experimentally
established phenomenon that
represents the negative effect on
performance when students feel
like they must perform well or risk
confirming negative intellectual
stereotypes. For example, female
students have been stereotyped to be less intellectually strong in
math, and thus female students’
math test performance likely
underestimates their true abilities
because the anxiety of having to
disprove this negative stereotype
lowers their performance on tests.
This is particularly true when the
student knows the test will be used
for comparative purposes, as is the
case in state standardized tests,
ACT and SAT tests, and AP tests.
Research has shown that stereotype
threat can underestimate Black
and Latinx students’ total SAT math
and reading scores by about 40
points. Though this is a large effect,
across our participating districts
the difference between students of
color and white students with the
same course grade was about 100
points on both the SAT math and
reading components. Thus, while
stereotype threat plays a role in our
findings, it likely does not explain
them entirely. For a thorough
understanding of stereotype threat,
see Steele, C. (2010). *Whistling
Vivaldi: And other clues to how
stereotypes affect us*. New York,
NY: W.W. Norton & Company. See
also Logel, C. R., Walton, G. M.,
Spencer, S. J., Peach, J., & Mark, Z.
P. (2012). Unleashing latent ability:
Implications of stereotype threat
for college admissions. *Educational
Psychologist*, 47(1), 42-50. https://
doi.org/10.1080/00461520.2011
.611368

^{34} Classrooms with the most grade-
appropriate assignments were
defined as those classrooms whose
average assignment score ranked
in the top quartile; classrooms
with the least grade-appropriate
assignments were those who ranked
in the bottom quartile.

^{35} Wilson, T. (2011). *Redirect:
The surprising new science of
psychological change*. New York,
NY: Little Brown.

^{36} Wenzlaff, R. & Wagner, D. (2000).
Thought suppression. *Annual
Review of Psychology, 5*1(1), pp.
59-91. https://doi.org/10.1146/annure...

^{37} Murphy, M.C., Kroeper, K., & Ozier, E. (2018). Prejudiced
places: How contexts shape
equality and how policy change
them. *Policy Insights from the
Behavioral and Brain Sciences,
5*(1), pp. 66-74. https://doi.
org/10.1177/2372732217748671

^{38} Cherng, H. & Halpin, P. (2016).
The importance of minority
teachers: Student perceptions of
minority versus white teachers.
*Educational Researcher, 45*(7), pp. 407–420. https://doi.
org/10.3102/0013189X16671718