StatCast-Only Based Projections: Hitters

April 10, 2020

With the season delayed, I’ve had time to dive into some shelved projects including creating some unique projections. Today, I’m going to introduce my StatCast hitter projections.

I created the projections with inspiration from “The Model Thinker” by Scott Page.* The author states, “do not put too much faith in one model”. To further explain this stance, he states:

“The lesson should be clear: if we can construct multiple diverse, accurate models, then we can make very acurate predictions and valuations and choose good actions.

…

Keep in mind, these second and third models need not be better than the first model. They could be worse. If they are a little less accurate, but categorically (in the literal sense) different, they should be added to the mix. “

Several projection systems already exist. Other projections take many projections and combine them. The issue is that projections are exclusively based on the previous season’s results (e.g. stolen bases, home runs) while incorporating some various levels of regression, aging factors, and yearly weightings. My goal is to create projections that don’t follow this standard cookie-cutter formula. I expect the projections to not be the most accurate because “all models are wrong.” I’m wanting a unique perspective on a hitter’s talent.

Besides wanting a different view, I focused on the StatCast data because I was aiming to kill two birds with one stone. Many analysts mention StatCast results as if they were predictive. Some may or may not be. No one knows for sure. I just wanted to know which of the various factors matter. There are average and max values for exit velocity and distance along with Barrel and Hard Hit Rates. Most of the stats try to tell if a batter has hit the ball hard and/or in the air. It was time to focus.

I found just the metrics that matter (Barrel%, Launch Angle, Max Velocity, and Sprint Speed). Additionally, I used our plate discipline stats (O-swing, Z-swing, O-contact, Z-contact) to project the hitter’s walk and strikeout rate. Finally, for stolen bases, I used Sprint Speed to estimate how often the hitter will likely run once on base.

I didn’t completely abandon the normal projection creation system. I created a yearly weighted, regressed, and age-adjusted projections for the various StatCast stats (batted ball, plate discipline, and stolen bases). Then, I converted these values (e.g. Barrels%) over to standard baseball stats (e.g. ISO). I didn’t take into account shifts, home parks, and league differences.

With the nerdy stuff out of the way, here are four years worth of projections. I could go back a couple more years but I’d only incorporate a year or two of data and regression would be a huge factor for everyone since the first hitter StatCast data was available in 2015.

To start analyzing the projections, here are the best and worst projected 2019 hitter ranked by OPS.

Ten Best & Worst Projected 2019 Hitters by OPS

Name	Season	PA	AB	H	HR	AVG	OBP	SLG	OPS	SB
Mike Trout	2019	600	534	170	45	.318	.402	.590	.992	22
Aaron Judge	2019	600	534	148	41	.277	.365	.560	.925	12
Mookie Betts	2019	600	542	167	35	.309	.383	.531	.914	14
Francisco Lindor	2019	600	559	175	33	.313	.368	.540	.908	15
Giancarlo Stanton	2019	600	545	153	40	.280	.355	.552	.907	10
Matt Carpenter	2019	600	533	153	33	.287	.374	.530	.904	6
Anthony Rendon	2019	600	546	167	32	.305	.376	.519	.895	10
Manny Machado	2019	600	555	164	36	.296	.357	.532	.889	6
Joey Gallo	2019	600	540	137	43	.255	.337	.549	.886	10
J.D. Martinez	2019	600	551	160	38	.290	.356	.529	.885	7

Andrew Romine	2019	600	559	136	9	.242	.296	.355	.651	8
Orlando Arcia	2019	600	562	137	11	.244	.295	.351	.646	8
Rajai Davis	2019	600	555	131	10	.236	.296	.348	.644	16
Brandon Phillips	2019	600	578	146	12	.253	.283	.361	.644	4
Adam Rosales	2019	600	540	112	14	.207	.289	.352	.641	7
Billy Hamilton	2019	600	550	130	8	.237	.303	.331	.634	27
Ichiro Suzuki	2019	600	552	139	4	.252	.313	.318	.631	6
Jon Jay	2019	600	562	140	5	.249	.298	.322	.620	5
Ronald Torreyes	2019	600	573	145	7	.254	.289	.324	.613	12
Dee Gordon	2019	600	576	150	1	.261	.292	.314	.606	17

A couple of the projection’s weaknesses are obvious. The first is Mike Trout’s .402 OBP. Over the past four seasons, his lowest rate was a .438 last season. While I tried to incorporate overall talent to bump up his walks, his 66 intentional walks (and 44 hit-by-pitches) from the past four seasons are not properly incorporated.

The second weakness is accounting for the shift. It eats up Joey Gallo’s and Matt Carpenter’s value. Carpenter faced a shift in all but in just eight plate appearances last season. The rest of the time, defenses were taking advantage of his near 50% Pull%. Otherwise, none of the projections seem too out of wack compared to the player’s expected talent. The good hitters are great and the bad hitters stayed bad.

Now for this year’s best and worse projections.

Ten Best & Worst Projected 2020 Hitters by OPS

Name	Season	PA	AB	H	HR	AVG	OBP	SLG	OPS	SB
Mike Trout	2020	600	532	173	49	.326	.413	.618	1.031	21
Yordan Alvarez	2020	600	550	167	40	.304	.370	.555	.925	8
Mookie Betts	2020	600	538	165	35	.308	.388	.529	.917	13
Austin Meadows	2020	600	549	163	36	.297	.365	.538	.903	14
Gary Sanchez	2020	600	555	161	45	.289	.350	.553	.903	4
Aaron Judge	2020	600	532	143	37	.270	.361	.541	.902	13
Giancarlo Stanton	2020	600	544	152	39	.279	.355	.545	.900	9
Ronald Acuna Jr.	2020	600	533	152	36	.284	.372	.526	.898	25
Peter Alonso	2020	600	552	160	41	.289	.354	.542	.896	6
Kyle Schwarber	2020	600	537	150	36	.279	.362	.533	.895	9

Mallex Smith	2020	600	542	126	7	.232	.308	.330	.638	23
Jeff Mathis	2020	600	544	114	13	.209	.285	.353	.638	4
Rajai Davis	2020	600	555	131	11	.235	.294	.341	.635	10
Drew Butera	2020	600	542	117	14	.216	.293	.339	.632	3
Richard Urena	2020	600	562	124	13	.221	.273	.358	.631	3
Jon Jay	2020	600	564	144	5	.255	.302	.328	.630	4
Bobby Wilson	2020	600	556	128	13	.230	.288	.335	.623	1
Ronald Torreyes	2020	600	573	141	13	.246	.282	.336	.618	7
Billy Hamilton	2020	600	548	125	7	.227	.295	.310	.605	22
Dee Gordon	2020	600	575	148	2	.257	.289	.316	.605	14

Again, the bad hitters are bad and nothing seems out of place with them. With the top hitters, the ultimate idiot check passes muster with Trout as the top player. A few more than expected names do stick out: Yordan Alvarez, Austin Meadows, Peter Alonso, and Kyle Schwaber. Besides Schwarber, the other three were being drafted way ahead of my values for them. I’m guessing other drafters dived into the StatCast values first and came away with the same great hitters.

Even while regressing last year’s stats quite a bit (see Alvarez’s and Alonso’s higher than expected stolen base total), the two rookies made the list. The key for both being ranked so high is that they hit the tar out of the ball. Alonso was 2nd in Max Exit Velocity and Alvarez seventh.

These projections are throwing me for a loop. I don’t really trust them (even though I created them) because I’m probably anchoring to previous expectations. I’ve drafted most of my teams using my old values and are those picks now wrong? Going forward, how much should I weigh these new ones? Will I be able to adjust next season and I graciously accept the differences? More on these questions later as I digest and backtest the information with my previous valuation method. I’ll find the answers soon but I’m not taking the plunge yet because I want to make sure these projections don’t have an obvious error.

Overall, I’ve met my goal of creating a unique projection system and it immediately answered some valuation questions. Going forward, I just need to determine how much should I weigh these values into my current player projection system (and deal with the lack of Runs and RBI). It’s not just this projection set to be added but at least two more projections/valuation systems I’m finishing up. Until they are done, please let me know what you think of these and how they can be improved.

* I’m a huge fan of this book. I think it’s a must-read for anyone creating and especially using projection systems. It dives into the weaknesses of various models (projections) and shows how unique projections can be combined for better results. My biggest problem with the book is that it’s creating more work for me.

15 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Joseph MeyerMember since 2016

5 years ago

This is the exact content that I want to see now. I’m very excited for this series. Personally, I would be interested in a deep dive here. How much regression to the mean for each stat, how everything gets translated into traditional slash values, what are the error bars for each projection, etc.

This style of a new look at old data is the right way to generate new content during these times.

Jeff ZimmermanFanGraphs Staff

Reply to Joseph Meyer

I’m not going to give the exact formula for each value, you’re just going to have to trust me. They match up to the other instances when people mention the value stabilizing.

Reply to Jeff Zimmerman

I think what I don’t understand is the process of converting the statcast values into traditional values. I certainly don’t care what the final formulas are. I just want to understand the process of doing that conversation.

I found the StatCast projections using the previous season’s data. Then, I used those values to find comparable traditional stats. For example, what weighting of o-swing, o-contact, z-swing, and z-contact best project BB% and K%. Once I have those two, I know how often the ball is in play and then can get BABIP, HR/BIP using the most predictive StatCast batted ball values.

Thanks for the response.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG