A Pitch Mechanics Consistency Data Experiment

The second word in the “music to many people’s ears” term, Spring Training, is an important one to consider. Pro ball players are training. They are preparing for the season. What types of things are they working on? Beat writers report out every year that pitchers are tinkering with new grips, different release points, varying arm slots, diets, cleats, the list is endless. This is assumed to be even more evident in pitchers. As they ramp up to game-ready status, what exactly are they ramping up and can it be quantified by a writer with only so much publicly available data at his fingertips? Away we go in answering that question together.

With statcast data available in spring training ballparks, we can access pitch-level data from the good folks at baseball savant. God bless them. There are a few metrics that measure what I would consider pitcher mechanics and here they are:

[‘release_speed’, ‘release_pos_x’, ‘release_pos_z’, ‘effective_speed’, ‘release_spin_rate’, ‘release_extension’,’spin_axis’]

These seven variables are incredibly manageable from a data perspective when compared to some of the more advanced biomechanical data teams and private company analysts are working with today. However, it can be really difficult to notice patterns from game to game just by looking at a spreadsheet:

Max Scherzer Statcast Data – 3/3/23
release_ release_ release_ effective_ release_ release_ spin_
speed pos_x pos_z speed spin_rate extension axis
94.1 -3.28 5.38 94.1 2269 6.3 229
93.2 -3.09 5.56 92.9 2213 6.2 221
93.7 -3.19 5.41 93.2 2384 6.0 226
92.9 -3.11 5.41 92.8 2317 6.2 224
93.2 -3.07 5.6 93.1 2223 6.2 219
*The header row was separated into two for viewing purposes.

Yes, you could look at this and make general assumptions. But, what if we want to visualize this? What if we wanted to hyper-analyze this so that the only people who really know what the heck is going on are the ones who are too busy playing the game in hyperspace? Bring in principal component analysis!

I’ve used this technique for a few articles here on FanGraphs. In this case, a principal component is being created based on multi-dimensional data, like the spreadsheet above with numerous columns, to create a new column. It cuts through the data and builds new “axes of variation” to better explain multiple data points. A more simplistic way of explaining this is that it’s taking multiple columns in the spreadsheet and condensing them into one. If we then create two of those new, condensed data columns, or principal components, then we can create a visualization. If this is too much data talk for you, hopefully, it gets better as I bring in the baseball.

Let’s start with the young, yet-to-debut major league pitcher, Grayson Rodriguez. How do the metrics above look, game by game as he ramps up for a season in which he expects to debut? I’ll create two principal components to help summarize a dataset similar to the table above and I’ll plot them on the x and y axis of a scatter plot, like this:

Gray-Rod Game 1 PCA Scatter

What we see is two variables, principal components one and two, explaining all the variables listed at the top of this article for one Gray-Rod Spring Training game’s worth of fastballs. It’s not very exciting. But, bring in a second game’s worth of fastballs to the visual and the excitement levels go through-the-roof!

Gray-Rod Game 1 and Game 2 PCA Scatter

…Ok, maybe it’s not that much more exciting. But, at least we can now see a little more of a story starting to develop. Ideally, since these variables are mostly repeatable we should see the blue and red dots sit closer together. What’s up with that game 2 outlier at the top of the second plot? We can compare that pitch with the averages of the other pitches in that game to analyze it further:

Data Point Evaluation
release_ release_ release_ effective_ release_ release_ spin_ PC PC
speed pos_x pos_z speed spin_rate extension axis 1 2
Data Point in Question 98.2 -2.27 6.14 100.3 2077.0 7.3 207.0 -0.00 0.02
Averages of Outing 97.9 -2.14 6.11 99.8 2021.1 7.4 208.4 -0.00 -0.00

It seems that this pitch had a higher release, effective speed, and release spin rate. Is this significant? I really have no idea. It could just be noise. I would love to know if Rodriguez would have noticed any difference after that pitch was thrown. Would he have admitted that he really wanted to get that guy out? Let’s go to the video to see what the situation was:

…Oh, wait. We can’t because MASN doesn’t want to film in sunny Florida. Luckily, we can still look at the savant video-less page here. On a 2-0 count against Spencer Torkelson, maybe Gray-Rod reared back and put a little extra mustard into making sure he didn’t get to 3-0. We’ll likely never know.

How might this compare with a pitcher who is more established? Let’s conduct this same analysis on two of Max Scherzer’s spring outings this season and compare:

Scherzer Game 1 and Game 2 PCA Scatter

Scherzer shows a little tighter spread between all of his pitches and lacks the clear outliers showcased by Rodriguez. The more interesting part to me is that the pitches get closer together from game one to game two. Could that mean anything? Could he be getting ramped up and more consistent, more repeatable?

Now the ultimate question in baseball analytics, how can we actually use this to win? I believe checking in on pitcher components throughout the season may be able to help us identify fatigued players who need rest in order to get their components back into a form that is more in line with areas of succes. This would require measuring the game by game spread or variation of the points. If that number is larger, is that a measurement of inconsistency? If it is lower, does it correlate with success? This analysis really brings up more questions than it answers, as per usual:

What if we changed the colors of the data points in the visualization to reflect individual start game scores?

Are tighter pitches (less spread among single games point locations) better?

What could be done with more data? Can this analysis be applied to biomechanical data?

How does it apply to non-fastballs? Do certain pitchers struggle with repeated motion on certain pitches and not others?

If this post is a thread in that old spring training baseball jersey you pulled out of the closet for your trip to Florida, then let’s start pulling until there’s nothing left and you’ve gotta borrow sunscreen from the shirtless guy next to you. My hope is that with a little more time and research, I’ll be able to utilize this analysis to detect in-season struggles by starting pitchers.

1 Comment
Newest Most Voted
Inline Feedbacks
View all comments
7 days ago

Nice, looking forward to what comes next; can you label each pitch with number (ie, first pitch is labelled “1”)? If that’s too messy, maybe change the point-character shape based on inning?

Last edited 7 days ago by couthcommander