Decision Trees Finding Top Pitching Talent

Machine learning allows human eyes to take a break from the mundane scrolling and sorting through spreadsheets while still gathering useful insights. What made a top-five pitcher (from a fantasy perspective) so great in 2021? You could answer that easily by sorting through our leaderboards. Robbie Ray? Well, he had an excellent K-BB% (25.2%, league-average 14.6%) Corbin Burnes? His K/9 was a huge 12.61 while the league was only at 8.9. These underlying metrics are important to take note of but can be difficult to analyze all at once. Don’t get me wrong, it can be done. Just look at Michael Simione’s latest piece where he compares pitchers’ underlying metrics. In fact, that’s a lot of fun to do! But, as you come out of your fantasy hibernation and are ready to begin making your draft rankings, you’ll want a quicker way to analyze large data sets all at once. Enter the decision tree. 

The question I wanted to answer was; what made a top fantasy pitcher in 2021? I looked at starting pitchers who gained at least 140 IP in 2021 and created the top five lists in each league. Here they are:

al_list = Robbie Ray, Gerrit Cole, Lance Lynn, José Berríos, Lucas Giolito

nl_list = Zack Wheeler, Walker Buehler, Corbin Burnes, Max Scherzer, Kevin Gausman

Each of these pitchers had a great 2021. Whether you think they were a top-five (per league) 2021 finisher or not could lead to very valid argumentation, but it is not the purpose of this article. I just needed a pool of players that could be considered top fantasy talent so that I could mark them as such in my data. Typically, you would run a decision tree on pre-labeled data, target a binary variable, and later pass in a totally new and unseen set of data points. The decision tree would take what it learned from the first past, and make predictions on the second. I’m not going to ask this model to make any predictions, rather, I wanted to see how it sorted my labeled data. Here’s a summary of what was passed through the model and a visualization of the decision tree:

68 rows of data (one row per pitcher)

10 out of 68 marked as top 10 pitchers (0: not top 10, 1: top 10)

19 variables (basically all the statistics from the advanced tab on our leaderboards)

That’s fun, isn’t it? Yeah…so…what are we looking at here? Follow along with me, starting at the first node at the top, the root node, or the initial split. Our decision tree is asking, which statistic allows us to best sort this data into two groups, right off the bat? The answer, in this case, was ERA. Notice that if the answer to each question is “false”, you move to the right and if the answer is “true”, you move to the left. 

Let’s follow the first decision to the right and look at who finished with an ERA above 2.87. As we move to the right, we get to another question. K/BB <= 5.83? No? Move the right and stop! We’ve reached our first terminal node, a stopping point for one specific data point. Who was included in my list of top ‘elite’ fantasy talent, had an ERA greater than 2.87 and a K/BB rate greater than 5.83? Answer: Gerrit Cole. Notice that in our visualization, he’s the only pitcher in that bucket. 

Let’s do another, let’s do another! This time, we’ll start with our root node and move to the left. By doing so we’re looking at pitchers with a sub 2.87 ERA. Then, let’s look at the next question; is your K/BB greater than 3.70? Is your FIP greater than 2.97? Yes? Then who are you? Answer: Kevin Gausman, Walker Buehler, Lance Lynn, Robbie Ray, and Max Scherzer. Notice this time that we have five pitchers in this bucket, or terminal node and no pitchers who were excluded from our top pitcher list. 

Have some fun with it, try to figure out who the two pitchers are in the terminal node all the way in the bottom left corner. It’s a fun exercise and it may give you some clearer views of 2021 starting pitchers and how they performed. Want to draft a top 10 fantasy pitcher in 2021? Find a good projection system and study this decision tree. Find pitchers who are projected to, most importantly, throw at least 140 innings, have a sub 2.87 ERA, a K/BB rate greater than 3.70, and put a little less stock in low FIPs because Robbie Ray did it with a 3.69 FIP in 2021.

Obviously, each year is different and if we really wanted to make some clearer assumptions, we would have to analyze larger data sets. Regardless of how predictive looking at how good pitchers performed in one season is on the next, a decision tree can give fantasy managers a higher-level overview of what made good, good, and what we should look to target in 2022.





4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
dukewinslowmember
2 years ago

I did this last year with…. Rough results. Why? /):$;&(@:):) INJURIES