StatCast-Only Based Projections: Hitters

With the season delayed, I’ve had time to dive into some shelved projects including creating some unique projections. Today, I’m going to introduce my StatCast hitter projections.

I created the projections with inspiration from “The Model Thinker” by Scott Page.* The author states, “do not put too much faith in one model”. To further explain this stance, he states:

“The lesson should be clear: if we can construct multiple diverse, accurate models, then we can make very acurate predictions and valuations and choose good actions.

Keep in mind, these second and third models need not be better than the first model. They could be worse. If they are a little less accurate, but categorically (in the literal sense) different, they should be added to the mix. “

Several projection systems already exist. Other projections take many projections and combine them. The issue is that projections are exclusively based on the previous season’s results (e.g. stolen bases, home runs) while incorporating some various levels of regression, aging factors, and yearly weightings. My goal is to create projections that don’t follow this standard cookie-cutter formula. I expect the projections to not be the most accurate because “all models are wrong.” I’m wanting a unique perspective on a hitter’s talent.

Besides wanting a different view, I focused on the StatCast data because I was aiming to kill two birds with one stone. Many analysts mention StatCast results as if they were predictive. Some may or may not be. No one knows for sure. I just wanted to know which of the various factors matter. There are average and max values for exit velocity and distance along with Barrel and Hard Hit Rates. Most of the stats try to tell if a batter has hit the ball hard and/or in the air. It was time to focus.

I found just the metrics that matter (Barrel%, Launch Angle, Max Velocity, and Sprint Speed). Additionally, I used our plate discipline stats (O-swing, Z-swing, O-contact, Z-contact) to project the hitter’s walk and strikeout rate. Finally, for stolen bases, I used Sprint Speed to estimate how often the hitter will likely run once on base.

I didn’t completely abandon the normal projection creation system. I created a yearly weighted, regressed, and age-adjusted projections for the various StatCast stats (batted ball, plate discipline, and stolen bases). Then, I converted these values (e.g. Barrels%) over to standard baseball stats (e.g. ISO). I didn’t take into account shifts, home parks, and league differences.

With the nerdy stuff out of the way, here are four years worth of projections. I could go back a couple more years but I’d only incorporate a year or two of data and regression would be a huge factor for everyone since the first hitter StatCast data was available in 2015.

To start analyzing the projections, here are the best and worst projected 2019 hitter ranked by OPS.

Ten Best & Worst Projected 2019 Hitters by OPS
Name Season PA AB H HR AVG OBP SLG OPS SB
Mike Trout 2019 600 534 170 45 .318 .402 .590 .992 22
Aaron Judge 2019 600 534 148 41 .277 .365 .560 .925 12
Mookie Betts 2019 600 542 167 35 .309 .383 .531 .914 14
Francisco Lindor 2019 600 559 175 33 .313 .368 .540 .908 15
Giancarlo Stanton 2019 600 545 153 40 .280 .355 .552 .907 10
Matt Carpenter 2019 600 533 153 33 .287 .374 .530 .904 6
Anthony Rendon 2019 600 546 167 32 .305 .376 .519 .895 10
Manny Machado 2019 600 555 164 36 .296 .357 .532 .889 6
Joey Gallo 2019 600 540 137 43 .255 .337 .549 .886 10
J.D. Martinez 2019 600 551 160 38 .290 .356 .529 .885 7
Andrew Romine 2019 600 559 136 9 .242 .296 .355 .651 8
Orlando Arcia 2019 600 562 137 11 .244 .295 .351 .646 8
Rajai Davis 2019 600 555 131 10 .236 .296 .348 .644 16
Brandon Phillips 2019 600 578 146 12 .253 .283 .361 .644 4
Adam Rosales 2019 600 540 112 14 .207 .289 .352 .641 7
Billy Hamilton 2019 600 550 130 8 .237 .303 .331 .634 27
Ichiro Suzuki 2019 600 552 139 4 .252 .313 .318 .631 6
Jon Jay 2019 600 562 140 5 .249 .298 .322 .620 5
Ronald Torreyes 2019 600 573 145 7 .254 .289 .324 .613 12
Dee Gordon 2019 600 576 150 1 .261 .292 .314 .606 17

A couple of the projection’s weaknesses are obvious. The first is Mike Trout’s .402 OBP. Over the past four seasons, his lowest rate was a .438 last season. While I tried to incorporate overall talent to bump up his walks, his 66 intentional walks (and 44 hit-by-pitches) from the past four seasons are not properly incorporated.

The second weakness is accounting for the shift. It eats up Joey Gallo’s and Matt Carpenter’s value. Carpenter faced a shift in all but in just eight plate appearances last season. The rest of the time, defenses were taking advantage of his near 50% Pull%. Otherwise, none of the projections seem too out of wack compared to the player’s expected talent. The good hitters are great and the bad hitters stayed bad.

Now for this year’s best and worse projections.

Ten Best & Worst Projected 2020 Hitters by OPS
Name Season PA AB H HR AVG OBP SLG OPS SB
Mike Trout 2020 600 532 173 49 .326 .413 .618 1.031 21
Yordan Alvarez 2020 600 550 167 40 .304 .370 .555 .925 8
Mookie Betts 2020 600 538 165 35 .308 .388 .529 .917 13
Austin Meadows 2020 600 549 163 36 .297 .365 .538 .903 14
Gary Sanchez 2020 600 555 161 45 .289 .350 .553 .903 4
Aaron Judge 2020 600 532 143 37 .270 .361 .541 .902 13
Giancarlo Stanton 2020 600 544 152 39 .279 .355 .545 .900 9
Ronald Acuna Jr. 2020 600 533 152 36 .284 .372 .526 .898 25
Peter Alonso 2020 600 552 160 41 .289 .354 .542 .896 6
Kyle Schwarber 2020 600 537 150 36 .279 .362 .533 .895 9
Mallex Smith 2020 600 542 126 7 .232 .308 .330 .638 23
Jeff Mathis 2020 600 544 114 13 .209 .285 .353 .638 4
Rajai Davis 2020 600 555 131 11 .235 .294 .341 .635 10
Drew Butera 2020 600 542 117 14 .216 .293 .339 .632 3
Richard Urena 2020 600 562 124 13 .221 .273 .358 .631 3
Jon Jay 2020 600 564 144 5 .255 .302 .328 .630 4
Bobby Wilson 2020 600 556 128 13 .230 .288 .335 .623 1
Ronald Torreyes 2020 600 573 141 13 .246 .282 .336 .618 7
Billy Hamilton 2020 600 548 125 7 .227 .295 .310 .605 22
Dee Gordon 2020 600 575 148 2 .257 .289 .316 .605 14

Again, the bad hitters are bad and nothing seems out of place with them. With the top hitters, the ultimate idiot check passes muster with Trout as the top player. A few more than expected names do stick out: Yordan Alvarez, Austin Meadows, Peter Alonso, and Kyle Schwaber. Besides Schwarber, the other three were being drafted way ahead of my values for them. I’m guessing other drafters dived into the StatCast values first and came away with the same great hitters.

Even while regressing last year’s stats quite a bit (see Alvarez’s and Alonso’s higher than expected stolen base total), the two rookies made the list. The key for both being ranked so high is that they hit the tar out of the ball. Alonso was 2nd in Max Exit Velocity and Alvarez seventh.

These projections are throwing me for a loop. I don’t really trust them (even though I created them) because I’m probably anchoring to previous expectations. I’ve drafted most of my teams using my old values and are those picks now wrong? Going forward, how much should I weigh these new ones? Will I be able to adjust next season and I graciously accept the differences? More on these questions later as I digest and backtest the information with my previous valuation method. I’ll find the answers soon but I’m not taking the plunge yet because I want to make sure these projections don’t have an obvious error.

Overall, I’ve met my goal of creating a unique projection system and it immediately answered some valuation questions. Going forward, I just need to determine how much should I weigh these values into my current player projection system (and deal with the lack of Runs and RBI). It’s not just this projection set to be added but at least two more projections/valuation systems I’m finishing up. Until they are done, please let me know what you think of these and how they can be improved.

 

* I’m a huge fan of this book. I think it’s a must-read for anyone creating and especially using projection systems. It dives into the weaknesses of various models (projections) and shows how unique projections can be combined for better results. My biggest problem with the book is that it’s creating more work for me.





Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Joseph Meyermember
4 years ago

This is the exact content that I want to see now. I’m very excited for this series. Personally, I would be interested in a deep dive here. How much regression to the mean for each stat, how everything gets translated into traditional slash values, what are the error bars for each projection, etc.

This style of a new look at old data is the right way to generate new content during these times.

Joseph Meyermember
4 years ago
Reply to  Jeff Zimmerman

I think what I don’t understand is the process of converting the statcast values into traditional values. I certainly don’t care what the final formulas are. I just want to understand the process of doing that conversation.

Joseph Meyermember
4 years ago
Reply to  Jeff Zimmerman

Thanks for the response.