StatCast-Only Based Projections: Hitters
With the season delayed, I’ve had time to dive into some shelved projects including creating some unique projections. Today, I’m going to introduce my StatCast hitter projections.
I created the projections with inspiration from “The Model Thinker” by Scott Page.* The author states, “do not put too much faith in one model”. To further explain this stance, he states:
“The lesson should be clear: if we can construct multiple diverse, accurate models, then we can make very acurate predictions and valuations and choose good actions.
…
Keep in mind, these second and third models need not be better than the first model. They could be worse. If they are a little less accurate, but categorically (in the literal sense) different, they should be added to the mix. “
Several projection systems already exist. Other projections take many projections and combine them. The issue is that projections are exclusively based on the previous season’s results (e.g. stolen bases, home runs) while incorporating some various levels of regression, aging factors, and yearly weightings. My goal is to create projections that don’t follow this standard cookie-cutter formula. I expect the projections to not be the most accurate because “all models are wrong.” I’m wanting a unique perspective on a hitter’s talent.
Besides wanting a different view, I focused on the StatCast data because I was aiming to kill two birds with one stone. Many analysts mention StatCast results as if they were predictive. Some may or may not be. No one knows for sure. I just wanted to know which of the various factors matter. There are average and max values for exit velocity and distance along with Barrel and Hard Hit Rates. Most of the stats try to tell if a batter has hit the ball hard and/or in the air. It was time to focus.
I found just the metrics that matter (Barrel%, Launch Angle, Max Velocity, and Sprint Speed). Additionally, I used our plate discipline stats (O-swing, Z-swing, O-contact, Z-contact) to project the hitter’s walk and strikeout rate. Finally, for stolen bases, I used Sprint Speed to estimate how often the hitter will likely run once on base.
I didn’t completely abandon the normal projection creation system. I created a yearly weighted, regressed, and age-adjusted projections for the various StatCast stats (batted ball, plate discipline, and stolen bases). Then, I converted these values (e.g. Barrels%) over to standard baseball stats (e.g. ISO). I didn’t take into account shifts, home parks, and league differences.
With the nerdy stuff out of the way, here are four years worth of projections. I could go back a couple more years but I’d only incorporate a year or two of data and regression would be a huge factor for everyone since the first hitter StatCast data was available in 2015.
To start analyzing the projections, here are the best and worst projected 2019 hitter ranked by OPS.
Name | Season | PA | AB | H | HR | AVG | OBP | SLG | OPS | SB |
---|---|---|---|---|---|---|---|---|---|---|
Mike Trout | 2019 | 600 | 534 | 170 | 45 | .318 | .402 | .590 | .992 | 22 |
Aaron Judge | 2019 | 600 | 534 | 148 | 41 | .277 | .365 | .560 | .925 | 12 |
Mookie Betts | 2019 | 600 | 542 | 167 | 35 | .309 | .383 | .531 | .914 | 14 |
Francisco Lindor | 2019 | 600 | 559 | 175 | 33 | .313 | .368 | .540 | .908 | 15 |
Giancarlo Stanton | 2019 | 600 | 545 | 153 | 40 | .280 | .355 | .552 | .907 | 10 |
Matt Carpenter | 2019 | 600 | 533 | 153 | 33 | .287 | .374 | .530 | .904 | 6 |
Anthony Rendon | 2019 | 600 | 546 | 167 | 32 | .305 | .376 | .519 | .895 | 10 |
Manny Machado | 2019 | 600 | 555 | 164 | 36 | .296 | .357 | .532 | .889 | 6 |
Joey Gallo | 2019 | 600 | 540 | 137 | 43 | .255 | .337 | .549 | .886 | 10 |
J.D. Martinez | 2019 | 600 | 551 | 160 | 38 | .290 | .356 | .529 | .885 | 7 |
Andrew Romine | 2019 | 600 | 559 | 136 | 9 | .242 | .296 | .355 | .651 | 8 |
Orlando Arcia | 2019 | 600 | 562 | 137 | 11 | .244 | .295 | .351 | .646 | 8 |
Rajai Davis | 2019 | 600 | 555 | 131 | 10 | .236 | .296 | .348 | .644 | 16 |
Brandon Phillips | 2019 | 600 | 578 | 146 | 12 | .253 | .283 | .361 | .644 | 4 |
Adam Rosales | 2019 | 600 | 540 | 112 | 14 | .207 | .289 | .352 | .641 | 7 |
Billy Hamilton | 2019 | 600 | 550 | 130 | 8 | .237 | .303 | .331 | .634 | 27 |
Ichiro Suzuki | 2019 | 600 | 552 | 139 | 4 | .252 | .313 | .318 | .631 | 6 |
Jon Jay | 2019 | 600 | 562 | 140 | 5 | .249 | .298 | .322 | .620 | 5 |
Ronald Torreyes | 2019 | 600 | 573 | 145 | 7 | .254 | .289 | .324 | .613 | 12 |
Dee Gordon | 2019 | 600 | 576 | 150 | 1 | .261 | .292 | .314 | .606 | 17 |
A couple of the projection’s weaknesses are obvious. The first is Mike Trout’s .402 OBP. Over the past four seasons, his lowest rate was a .438 last season. While I tried to incorporate overall talent to bump up his walks, his 66 intentional walks (and 44 hit-by-pitches) from the past four seasons are not properly incorporated.
The second weakness is accounting for the shift. It eats up Joey Gallo’s and Matt Carpenter’s value. Carpenter faced a shift in all but in just eight plate appearances last season. The rest of the time, defenses were taking advantage of his near 50% Pull%. Otherwise, none of the projections seem too out of wack compared to the player’s expected talent. The good hitters are great and the bad hitters stayed bad.
Now for this year’s best and worse projections.
Name | Season | PA | AB | H | HR | AVG | OBP | SLG | OPS | SB |
---|---|---|---|---|---|---|---|---|---|---|
Mike Trout | 2020 | 600 | 532 | 173 | 49 | .326 | .413 | .618 | 1.031 | 21 |
Yordan Alvarez | 2020 | 600 | 550 | 167 | 40 | .304 | .370 | .555 | .925 | 8 |
Mookie Betts | 2020 | 600 | 538 | 165 | 35 | .308 | .388 | .529 | .917 | 13 |
Austin Meadows | 2020 | 600 | 549 | 163 | 36 | .297 | .365 | .538 | .903 | 14 |
Gary Sanchez | 2020 | 600 | 555 | 161 | 45 | .289 | .350 | .553 | .903 | 4 |
Aaron Judge | 2020 | 600 | 532 | 143 | 37 | .270 | .361 | .541 | .902 | 13 |
Giancarlo Stanton | 2020 | 600 | 544 | 152 | 39 | .279 | .355 | .545 | .900 | 9 |
Ronald Acuna Jr. | 2020 | 600 | 533 | 152 | 36 | .284 | .372 | .526 | .898 | 25 |
Peter Alonso | 2020 | 600 | 552 | 160 | 41 | .289 | .354 | .542 | .896 | 6 |
Kyle Schwarber | 2020 | 600 | 537 | 150 | 36 | .279 | .362 | .533 | .895 | 9 |
Mallex Smith | 2020 | 600 | 542 | 126 | 7 | .232 | .308 | .330 | .638 | 23 |
Jeff Mathis | 2020 | 600 | 544 | 114 | 13 | .209 | .285 | .353 | .638 | 4 |
Rajai Davis | 2020 | 600 | 555 | 131 | 11 | .235 | .294 | .341 | .635 | 10 |
Drew Butera | 2020 | 600 | 542 | 117 | 14 | .216 | .293 | .339 | .632 | 3 |
Richard Urena | 2020 | 600 | 562 | 124 | 13 | .221 | .273 | .358 | .631 | 3 |
Jon Jay | 2020 | 600 | 564 | 144 | 5 | .255 | .302 | .328 | .630 | 4 |
Bobby Wilson | 2020 | 600 | 556 | 128 | 13 | .230 | .288 | .335 | .623 | 1 |
Ronald Torreyes | 2020 | 600 | 573 | 141 | 13 | .246 | .282 | .336 | .618 | 7 |
Billy Hamilton | 2020 | 600 | 548 | 125 | 7 | .227 | .295 | .310 | .605 | 22 |
Dee Gordon | 2020 | 600 | 575 | 148 | 2 | .257 | .289 | .316 | .605 | 14 |
Again, the bad hitters are bad and nothing seems out of place with them. With the top hitters, the ultimate idiot check passes muster with Trout as the top player. A few more than expected names do stick out: Yordan Alvarez, Austin Meadows, Peter Alonso, and Kyle Schwaber. Besides Schwarber, the other three were being drafted way ahead of my values for them. I’m guessing other drafters dived into the StatCast values first and came away with the same great hitters.
Even while regressing last year’s stats quite a bit (see Alvarez’s and Alonso’s higher than expected stolen base total), the two rookies made the list. The key for both being ranked so high is that they hit the tar out of the ball. Alonso was 2nd in Max Exit Velocity and Alvarez seventh.
These projections are throwing me for a loop. I don’t really trust them (even though I created them) because I’m probably anchoring to previous expectations. I’ve drafted most of my teams using my old values and are those picks now wrong? Going forward, how much should I weigh these new ones? Will I be able to adjust next season and I graciously accept the differences? More on these questions later as I digest and backtest the information with my previous valuation method. I’ll find the answers soon but I’m not taking the plunge yet because I want to make sure these projections don’t have an obvious error.
Overall, I’ve met my goal of creating a unique projection system and it immediately answered some valuation questions. Going forward, I just need to determine how much should I weigh these values into my current player projection system (and deal with the lack of Runs and RBI). It’s not just this projection set to be added but at least two more projections/valuation systems I’m finishing up. Until they are done, please let me know what you think of these and how they can be improved.
* I’m a huge fan of this book. I think it’s a must-read for anyone creating and especially using projection systems. It dives into the weaknesses of various models (projections) and shows how unique projections can be combined for better results. My biggest problem with the book is that it’s creating more work for me.
Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.
This is the exact content that I want to see now. I’m very excited for this series. Personally, I would be interested in a deep dive here. How much regression to the mean for each stat, how everything gets translated into traditional slash values, what are the error bars for each projection, etc.
This style of a new look at old data is the right way to generate new content during these times.
I’m not going to give the exact formula for each value, you’re just going to have to trust me. They match up to the other instances when people mention the value stabilizing.
I think what I don’t understand is the process of converting the statcast values into traditional values. I certainly don’t care what the final formulas are. I just want to understand the process of doing that conversation.
I found the StatCast projections using the previous season’s data. Then, I used those values to find comparable traditional stats. For example, what weighting of o-swing, o-contact, z-swing, and z-contact best project BB% and K%. Once I have those two, I know how often the ball is in play and then can get BABIP, HR/BIP using the most predictive StatCast batted ball values.
Thanks for the response.