Assessment of State Education Spending by Year

I usually don't get involved on social media comment threads, with the exception of some posts about Dungeons & Dragons or other hobbies. However, for whatever reason, I felt compelled to respond to this comment from a post on Reddit about the state of education spending in the US. This was because it contained data that was contrary to my opinion, but when I went to go find a good counter-point (to satisfy my own curiosity), I ended up not finding any graphics or tables that got exactly the data I wanted to see from a brief Google search. Unsure whether I should reconsider my own stance on the matter, I decided to go ahead and try my hand at visualizing some of the data for myself from the sources that are freely available from government and elsewhere online. My bias would probably be described as liberal by my family in the Midwest, and moderate-to-conservative by my friends on the coasts and elsewhere.

Data Sources

Education Spending

Source data for Education spending (public-school-per-pupil-expenditures.xlsx) downloaded here. See notes about adjusting dollar values for purchasing power by year and state below. I also adjusted enrollment counts by state based on the total number of families counted in the 2010-2014 median estimates used for the median per capita state-wise income adjustment.

(Source data ... source): National Center for Education Statistics, National Public Education Financial Survey (various years), data as of January 2019; U.S. Department of Education, National Center for Education Statistics, State Nonfiscal Survey of Public Elementary/Secondary Education (various years), data as of October 2019.

Recommended Citation:

National Science Board. “Expenditures per Pupil for Elementary and Secondary Public Schools.” Science and Engineering Indicators: State Indicators. Alexandria, VA: National Science Foundation. https://ncses.nsf.gov/indicators/states/indicator/public-school-per-pupil-expenditures. Accessed on 2020-07-22.

US Executive Party (By Year)

Source data is included in us-executive-branch.xlsx; it was constructed using data from Wikipedia as a reference.

US Legislative Party (By Year)

Source data is included in us-legislative-branch.xlsx; it was constructed based on table available here.

US Judicial Party (By Year)

Source data is included in us-judicial-branch.xlsx; it was copied from justices.csv available here.

Recommended Citation:

Andrew D. Martin and Kevin M. Quinn. 2002. "Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953-1999." Political Analysis. 10:134-153. [PDF]

US State Metadata

Source data is included in us-state-metadata.xlsx; it contains metadata about states parsed from a variety of sources such as General Election Results in 2016, etc.

Adjustment for Purchasing Power (By Year)

Source data is included in us-cpi.xlsx; the corresponding years were copied from data available here.

Adjustment for Purchasing Power (By State)

I adjusted for cost-of-living differences using data from Wikipedia that I copied from here; my rationale was if you are living somewhere with a higher per capita income, then effectively your cost of living is higher, and accordingly the same expenditures on education will not go as far. This method could probably be improved with a little more research. Currently, I use a method similar to the CPI adjustment in that it's basically a ratio using the median per capita income of state of Kansas (2010-2014) and adjusting according to the relative ratio of the compared state median per capita income during that same time period. This adjustment is done after the "By Year" adjustment. So, ultimately, any dollar values are in "Adjusted 2016-Kansas Dollars" (even if sometimes I forget to call them by that long name).

A note about reproducing graphics & results

This script was written as a live-script (*.mlx) for Matlab 2020b (Prerelease), then exported directly to .html format.

If running in Matlab, for more information type help Education_Spending_EDA into the Command Window.

All source-code is freely and openly available on GitHub, under the GNU General Public License v3.0.

All data is included in the repository as Excel spreadsheets.

Annual Trends in Education Spending

We'll create trend lines by year with shading indicating the range of observations (individual States' values) over that same time.

Years are on the x-axis,
log-transformed Total Enrollment (counts) is on the left y-axis (the log-transform is useful for Counts of Things, which should be Poisson-distributed),
and Expenditures Per-Pupil (dollars) are on the right y-axis.

The graphic will show the observed 90% confidence-bounds on the per-year trends so we can get an idea of how variable the data is across states, through time.

addpath('overloaded'); % Any Matlab built-ins that are overloaded go here

[G,E,T] = p__.getEducationData('public-school-per-pupil-expenditures.xlsx');

gfx__.plotTrendsByYear(E,T,'CB',0.95); % Do it as a separate function because *.mlx and git don't always play nicely

Figure 1: 1993-2016 median per-year enrollment and expenditures. I personally had the assumption that Enrollment Count would increase over this period, but the median value indicates that this is not the case. Looking at it on a per-state basis indicates that some states saw a tremendous growth in enrollment (for example, larger states like California and Texas), whereas others actually saw a decline in enrollment during the same timespan, so I guess that cancels the effect out somewhat. There is an increase in per-student spending over the same time-period, even when values are adjusted for 2016-dollars (I used the CPI method described here, see Example #2). Notably, the only time adjusted per-student spending ever actually declines significantly is actually during the previous administration, although this period coincides with a known economic crisis that happened when the housing bubble burst. I also noticed that during that administration, the disparity between the median and the upper extrema of the distribution becomes larger: this indicates that a growing disparity in per-state spending on education was already emerging at this time.

gfx__.plotTrendsByYearAndState({G,E,T});

Figure 2: 1993-2016 per-year trends by state. I split the same data into trend lines by state and plotted them as separate subplots for each of the three types of data contained in the expenditures table (from left to right: Gross Annual Spending; Enrollment; and Per-Student Annual Spending). We can see that in general, Per-Student Annual Spending increased regardless of state, although the total amount of spending was dependent upon the state, presumably due to covarying changes in enrollment that diverge at a state level. I found it interesting that enrollment counts, even when normalized to total number of families, were extremeley heterogeneous on a per-state basis,

Generally-speaking, this visualization isn't the nicest because the lines are pretty cluttered and overlap in ways that cause them to obscure the underlying trends in areas of the graph that are "compact." A better way would be to look at this is to decompose trends qualitatively into the most prevalent "features" of the data, group together the geospatial entities that are most-strongly associated with said "features", then interpret what those features indicate on the basis of their temporal and geospatial qualities; not everyone will agree that this approach is sound for certain philosophical reasons, but I don't care. Per-state adjustments account for differences in cost of living; I standardized everything using matched Median Per Capita Income from this table on Wikipedia, which is based on the median values from 2010 to 2014.

Geospatial Trends in Educational Spending

I decided to base the mapping of "Red" and "Blue" states on the results of the 2016 Elections, since that result is the only time I've cared about the outcome of a Presidential Election. With the removal of Sanders from the Democratic primary for the upcoming 2020 Elections, I've returned to political apathy and won't be voting in the upcoming election.

geoData = gfx__.showPartyMapping();

Figure 3: Classification of states as Republican or Democratic. This should match the results of the 2016 Election, unless I made a mistake putting in the spreadsheet by hand.

We can create an interactive slider uicontrol object in the Matlab live-script to allow us to look at spending by State for any particular fixed year (1993-2016).

fixedYear =1995; % Dragging this re-runs the section

geoDataSpending = gfx__.showStateData(geoData,T,fixedYear);

Figure 4: Tool for visualizing state expenditures per student for a selected year. To get a feel for if there are geospatial groupings or trends by Year, one approach is to just use the Slider and look at the exported map for one year at a time. I don't think the Slider will work for the html exported version, unfortunately. In general, this shows us that during the 90's and early 2000's, the east coast and Great Lakes regions spent a lot more per-student on education, while other areas of the country, particularly in the Midwest, South, or sparsely populated mountain states, did not spend as much per student. One interesting thing about this map is that prior to adjusting for cost-of-living in different parts of the country, California looked like it was spending a lot more per-student than after the adjustment. After the housing bubble crash, per-student spending is predictably worse more or less across the country. By 2016, the spending patterns have still not recovered to what they were during the pre-crash era, with the exception of states like New York.

The following series plots the first five principal components of the adjusted per-state, per-year spending trends for per-student expenditures. These 5 components explain roughly 95% of the data and comprise the majority of the fluctuations we would expect to see insofar as grouped patterns in the data (e.g. some states increase or decrease with respect to some component over time). I chose to use Principal Components Analysis (PCA) because it is linear and therefore fast and easy to understand. Because we already know that in general, there is a tendency to increase per-student education spending for all states over this time period (Figures 1 & 2), I removed the annual mean trend in adjusted per-student expenditures. This means that the components we are observing reflect groups of per-state "sustained deviations from the norm."

gfx__.showStateSpendingClusters(geoDataSpending,'PC_Index',1,'RemoveMean',true);

Figure 5a: Primary principal component in annual state per-student expenditure trends between 1993 and 2016. This trend is easily the most-prevalent in the data (a little under 80% of the data can be explained from this simple trend alone).

Top left panel: each principal component "explains" some portion of the data. Skip ahead to the bolded statement below if you already understand principal components. If you are unfamiliar with principal components, the easiest way to understand this is simply by thinking of a bunch of points drawn out in space of whatever dimension you'd like. The main principal component finds the straight line that passes near those points in an "optimal" way (minimizing the square of the distance from the line to each data point) so that the line is oriented in the direction that the data is "most spread out" along. If the data in space are grouped in a perfect circle or sphere, then there is no such line, and our curve of how much data each component explains will match y = x perfectly. On the other hand, if there is some linear structure to the data, then the first component will explain a large proportion of the data (as is the case here). In the event that the data are organized completely along a single line, then the data are linearly dependent with respect to the space we observe them in, which in reality is quite rare when observing random samples from a population (but does happen when experimental design is suboptimal, as often happens due to practical constraints). TL;DR about principal components: the first component explains a majority of our data; that's the main takeaway of the top-left panel.

Bottom-left panel: we also should say what the components are when we are discussing them. On the x-axis is year and y-axis is the principal component score, which essentially gives us an intuition of what trend the principal component is capturing. The selected principal component (which is shown in the geospatial plot) is highlighted in salmon. Qualitatively, this trend reflects the generic tendency toward an increase in per-student spending seen in the US between 1993 and 2016. Interestingly, between 1993 and 2000, this same trend indicates that states with the largest spending gains during the Bush administration did not see improvements in education spending under the Clinton administration, but despite the economic downturn in 2009, continued to see increased per-student spending until 2016.

Right panel: geospatial weightings corresponding to PC-1 that is higlighted in salmon in the bottom-left panel. Blue indicates states that most-strongly followed the trend indicated by the salmon line, while red indicates states that most-strongly went "against the grain;" in fact, the negative weighting indicates that (relative to the overall average improvements seen during this time, in terms of increased spending from year-to-year), some states had a tendency to have a diminished increase in spending by year during this same period of growth.

gfx__.showStateSpendingClusters(geoDataSpending,'PC_Index',2,'RemoveMean',true);

Figure 5b: Second principal component in annual state per-student expenditure trends between 1993 and 2016. This trend also sees a non-trivial bump in the amount of data explained, bringing us to approximately 90% explained. Without as detailed of a breakdown as in the first figure, we see that the salmon line indicates that this is a trend towards reduced per-student spending during 90's and early 2000's, and then it turns around and starts increasing right around the housing bubble crisis. The mostly-negative weightings throughout the country on the geospatial map indicate that probably the opposite is true: this is indicative of a subtle trend toward increased per-student expenditures, which went away when the housing bubble burst.

gfx__.showStateSpendingClusters(geoDataSpending,'PC_Index',3,'RemoveMean',true);

Figure 5c: Third principal component in annual state per-student expenditure trends between 1993 and 2016. This trend and the next one (Figure 5d) don't explain too much of the data, but they might serve as interesting comparisons against one another. We can see that the salmon line indicates this one is likely a dip that precedes and relates to the housing market crash. However, this component sees a sustained downturn for several years after the housing bubble, with slow recovery after. Spatially, this may reflect a response in traditionally "Blue" states, so it's interesting to note that there is an increase in the time it takes them to recover in terms of getting back to spending on per-student education.

gfx__.showStateSpendingClusters(geoDataSpending,'PC_Index',4,'RemoveMean',true);

Figure 5d: Fourth principal component in annual state per-student expenditure trends between 1993 and 2016. In contrast to Figure 5c, the trend, which also explains much less of the data relative to the first two components, shows a much faster recovery after the housing crisis, which then turns into a downturn beyond 2011. This trend is more spatially homogeneous, but does seem to roughly line up with areas of the country that voted red in the 2016 election. I don't really have any expertise to comment on these differences, but just wanted to point them out.

Geospatial Trends in Enrollment

One other point I wanted to check was using the same sort of analysis to look at enrollment trends. We can look at it in the same way as with the per-student expenditures, first creating a uicontrol to look at fixed values during different years.

fixedYear =2003; % Dragging this re-runs the section

[geoDataEnrollment,~,states] = gfx__.showStateData(geoData,E,fixedYear);

Figure 6: Geographically mapped enrolled students per family. I don't think having the static snapshot quite gives us the picture we're looking for so again, we will look at this by decomposing to principal components and plotting the component trends as well as the geospatial activations with respect to those trends.

We'll look at the first two principal components for enrollment in this way.

gfx__.showStateSpendingClusters(geoDataEnrollment,'PC_Index',1,'RemoveMean',true);

Figure 7a: Primary principal component of annual per-family student enrollment rate is a tendency toward increased enrollment. This is what I really expected to see across most of the states: a tendency towards an increase in student enrollment, as the population grows. The first principal component explains a little over 80% of these data. The states that it is most-accurate for are the large states that you might expect, particularly Texas and California. Note that orange-to-red indicates a negative weighting for this trend.

gfx__.showStateSpendingClusters(geoDataEnrollment,'PC_Index',2,'RemoveMean',true);

Figure 7b: Second principal component of annual per-family student enrollment rate is a dip followed by recovery. I don't have much of an idea for why this is the case.