Adventures in Projecting September Pitching Performances
Should you still trust pitching projections in the final month of the season? Or is it enough to just look at how a player has performed this year? This article explores some strategies for how to better project September pitching performances.
A baseball fan’s dilemma: let’s say one of your veteran pitchers has a 2.5 ERA, a 4.5 xFIP, and a 4.5 FIP across 100 innings this year, and a 3.75 projected ERA for the rest of the season–assuming this in-season ERA projection takes into account the last three to four years of performance data including the current year, accounting for historical data on strikeouts, walks, and home run prevention, as is typical (e.g., MARCEL). This hypothetical pitcher is by all accounts healthy, with similar stuff and velocity compared to past years. Which figure should you trust most the rest of the way? Does your answer change if his stuff is trending way down this year?
To help answer these sorts of questions, I looked at how to best project “last month of season” pitching performances, after August 31st, from 2004 to 2023 (a total sample of 10,542 “last month of season” pitching performances). For each “last month of season” pitching performance, I created a full projection based on the past four years of data using an in-season variation of MARCEL weights. So, to take an example, to project the last month of pitching in 2019, I created a projection based on performance data from the first five months of 2019 (pre-August 31st), all of 2018, 2017, and 2016. Recency is weighted more heavily, so the weights for 2019/2018/2017/2016 are 3.2/2.2/1.2/.2, respectively — 2016 is barely weighted at all. I created three projections, one based on ERA Minus (ERA-), one based on xFIP Minus (xFIP-) and one based on FIP Minus (FIP-) (100 is league average for each of these metrics, and lower is better).
I used these three metrics for convenience, because they are both park and league adjusted, they are all on the same scale, and they are easy to download from FanGraphs (whereas hipster-favorite SIERA is on a different scale and there is no kwERA- easily downloadable). xFIP- and FIP- also capture three of the key components that more complex projections rely on, K%, BB%, and some measure of home run prevention. I also added in a regression amount aligned with past research that minimized forecast error in the sample: 1,590 total batters faced (TBF) for the ERA- forecast, 160 TBF for xFIP-, and 570 TBF for FIP-. For simplicity, I did not account for aging or minor league performance (MARCEL does not account for that latter either, assuming everyone without MLB data is league average).
The weighted root mean square error (RMSE, a measure of forecast accuracy, weighted by total batters faced in the last month of a given season) for each of the three forecasts when projecting “last month of season,” post-August 31st ERA-, are shown in the table below. Note that a lower RMSE signifies better forecast accuracy. For example, the full projection based on four years of historical xFIP- data has an RMSE of 72.38 when projecting “last month of season” ERA- for the full sample (first RMSE column); this is more accurate than the full projection based on four years of historical data on ERA-, with an RMSE of 72.87 when projecting “last month of season” ERA-. Also note that differences in RMSE across projection methods may appear to be small–as a rule, it is difficult to improve RMSE by much beyond a naive model, e.g., MARCEL.
What are the takeaways from this exercise? First, focusing on the full sample of 10,542 post-August pitching performances (the first RMSE column), a projection based on xFIP- is more accurate than a projection based on FIP-. And both xFIP- and FIP- are more accurate than a projection based on ERA-. The superiority of xFIP- is no surprise as it captures the three stickiest indicators of pitching success, K%, BB%, and FB% (fly-ball rate), which is also reflected in xFIP-‘s much smaller regression amount relative to the other measures. Second, even in September, the full forecast that captures four years of historical data is more accurate than a forecast based on data from the current year alone (see, “Projection with 1 year of data” rows). This is true whether you’re looking at forecasts based on ERA-, xFIP-, or FIP-.
For instance, using a projection that captures just the current year (pre-August 31st) of historical ERA- data has an RMSE of 73.02 when projecting last month of season ERA-, whereas using a “full” projection based on four years of ERA- data has an RMSE of 72.87 when projecting last month of season ERA-. For the projections based on FIP- and ERA-, more historical data is generally better, as forecast accuracy improves with each additional year of forecast data added. However, for the projection based on xFIP-, RMSE is minimized with two years of historical data captured — adding a third and fourth year of historical data slightly hurts forecast accuracy.
What if, instead of looking at the full sample of pitchers, we focus on arms that gained velocity year-over-year? Perhaps weighting historical data is less important for pitchers who have seen their stuff change significantly in a given year. The third column of RMSE focuses on a sample of 2,442 pitchers that experienced at least a one-mile-per-hour absolute change in average velocity compared to the previous season. The last column focuses on a 485-pitcher sample that experienced at least a two-mile-per-hour absolute change in average velocity. For these particular sub-samples, the projections based on one year of data perform much more strongly. For the two-mile-per-hour changers, the projections that capture only one year of historical data perform the best, outperforming their “full,” “two year,” and “three year” counterparts, regardless of whether they are projections based on ERA-, xFIP-, or FIP-.
Put differently, it is less helpful to look at historical data beyond the current season for guys who have experienced a big change in their stuff — we should probably have a shorter memory for this sub-group. Similarly, for one-mile-per-hour changers, the projection based on one year of xFIP- performs better than any of the other xFIP- forecasts. For one-mile-per-hour changers, the two-year versions of FIP- and ERA- perform the best of their counterparts. This evidence suggests that the more a pitcher’s stuff changes relative to the past, the less we should weight their performance prior to the current season–a takeaway aligned with common sense.
So, back to that hypothetical pitcher with a 2.5 ERA, 4.5 xFIP, 4.5 FIP in 2024, and a 3.75 ERA projection the rest of the way–what should we expect for their performance this September? If their velocity and stuff are similar to past year’s, the ERA projection is still our best bet.
However, if their stuff is trending way down, baseball fans would be wise to put more weight on that underwhelming 2024 xFIP (or FIP)–or at least find a projection that weights the current year more heavily. And what about that breakout arm that is currently destroying projection models, with scouts concurrently raving about an uptick in their stuff? If there were ever a time to put projections aside and weight current year performance more heavily, this would be it.
Who are the biggest standouts of under performers this year?