Decision Trees Finding Top Pitching Talent

by Lucas Kelly

November 4, 2021

Machine learning allows human eyes to take a break from the mundane scrolling and sorting through spreadsheets while still gathering useful insights. What made a top-five pitcher (from a fantasy perspective) so great in 2021? You could answer that easily by sorting through our leaderboards. Robbie Ray? Well, he had an excellent K-BB% (25.2%, league-average 14.6%) Corbin Burnes? His K/9 was a huge 12.61 while the league was only at 8.9. These underlying metrics are important to take note of but can be difficult to analyze all at once. Don’t get me wrong, it can be done. Just look at Michael Simione’s latest piece where he compares pitchers’ underlying metrics. In fact, that’s a lot of fun to do! But, as you come out of your fantasy hibernation and are ready to begin making your draft rankings, you’ll want a quicker way to analyze large data sets all at once. Enter the decision tree.

The question I wanted to answer was; what made a top fantasy pitcher in 2021? I looked at starting pitchers who gained at least 140 IP in 2021 and created the top five lists in each league. Here they are:

al_list = Robbie Ray, Gerrit Cole, Lance Lynn, José Berríos, Lucas Giolito

nl_list = Zack Wheeler, Walker Buehler, Corbin Burnes, Max Scherzer, Kevin Gausman

Each of these pitchers had a great 2021. Whether you think they were a top-five (per league) 2021 finisher or not could lead to very valid argumentation, but it is not the purpose of this article. I just needed a pool of players that could be considered top fantasy talent so that I could mark them as such in my data. Typically, you would run a decision tree on pre-labeled data, target a binary variable, and later pass in a totally new and unseen set of data points. The decision tree would take what it learned from the first past, and make predictions on the second. I’m not going to ask this model to make any predictions, rather, I wanted to see how it sorted my labeled data. Here’s a summary of what was passed through the model and a visualization of the decision tree:

You Aren't a FanGraphs Member

It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.

We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.

1. Ad Free viewing! We won't bug you with this ad, or any other.

2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.

3. Dark mode and Classic mode!

4. Custom player page dashboards! Choose the player cards you want, in the order you want them.

5. One-click data exports! Export our projections and leaderboards for your personal projects.

6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)

7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.

8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.

9. A weekly mailbag column, exclusively for Members.

10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!

We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

Click Here To Become a Member

68 rows of data (one row per pitcher)

10 out of 68 marked as top 10 pitchers (0: not top 10, 1: top 10)

19 variables (basically all the statistics from the advanced tab on our leaderboards)

That’s fun, isn’t it? Yeah…so…what are we looking at here? Follow along with me, starting at the first node at the top, the root node, or the initial split. Our decision tree is asking, which statistic allows us to best sort this data into two groups, right off the bat? The answer, in this case, was ERA. Notice that if the answer to each question is “false”, you move to the right and if the answer is “true”, you move to the left.

Let’s follow the first decision to the right and look at who finished with an ERA above 2.87. As we move to the right, we get to another question. K/BB <= 5.83? No? Move the right and stop! We’ve reached our first terminal node, a stopping point for one specific data point. Who was included in my list of top ‘elite’ fantasy talent, had an ERA greater than 2.87 and a K/BB rate greater than 5.83? Answer: Gerrit Cole. Notice that in our visualization, he’s the only pitcher in that bucket.

Let’s do another, let’s do another! This time, we’ll start with our root node and move to the left. By doing so we’re looking at pitchers with a sub 2.87 ERA. Then, let’s look at the next question; is your K/BB greater than 3.70? Is your FIP greater than 2.97? Yes? Then who are you? Answer: Kevin Gausman, Walker Buehler, Lance Lynn, Robbie Ray, and Max Scherzer. Notice this time that we have five pitchers in this bucket, or terminal node and no pitchers who were excluded from our top pitcher list.

Have some fun with it, try to figure out who the two pitchers are in the terminal node all the way in the bottom left corner. It’s a fun exercise and it may give you some clearer views of 2021 starting pitchers and how they performed. Want to draft a top 10 fantasy pitcher in 2021? Find a good projection system and study this decision tree. Find pitchers who are projected to, most importantly, throw at least 140 innings, have a sub 2.87 ERA, a K/BB rate greater than 3.70, and put a little less stock in low FIPs because Robbie Ray did it with a 3.69 FIP in 2021.

Obviously, each year is different and if we really wanted to make some clearer assumptions, we would have to analyze larger data sets. Regardless of how predictive looking at how good pitchers performed in one season is on the next, a decision tree can give fantasy managers a higher-level overview of what made good, good, and what we should look to target in 2022.

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

dukewinslowMember since 2020

4 years ago

I did this last year with…. Rough results. Why? /):$;&(@:):) INJURIES

Green Mountain Boy

I’m just going to explain my personal way of doing things, which has proven to work pretty well over the years:

1. Consistency and trend. We’ve all been blown up by that guy who had a wonderful year, except for his two starts of 7 ER in 2 IP. A record of QS >50% for at least 2 consecutive years is a great place to start.
2. K/BB >4 and K/9 >10. I want guys who are confident in their stuff and don’t nibble or throw non-competitive pitches. Watch a guy sometimes and count his NCPs. Might as well just keep it in the glove.
3. Velocity >= 95. Nuf ced
4. Age, Guys in their primes who fall under 1) above, or kids <25 who have shown steady, consistent improvement. No, they all don't have success right away, but over 3 years, I'd bet 80% do. The guy who seems inevitable to me now? Tarik Skubal. Guys like DeGrom and Scherzer are still great, but at some point everyone loses it. DeGrom showed the first warning signs this year. Who's next?
5. Stay away from guys who all of a sudden have "career years" our of nowhere. It's Medusa. Don't look into her eyes.
6. Health history. TJS? Stay away till they show they're still what they were, usually in year 2 post-procedure. Staying healthy is a skill.
7. The ability and willingness to pitch inside, be effective in there, and command the ball there. This is HUGE. All the greats said it. You can have the outside half of the plate, but the inside is MINE.
8. The ability to locate your entire repertoire, which brings me to…
9. The old-school eye test. Stats are great, but they're not everything. Example: 2007. I was watching a Red Sox game and this guy seemed to be not throwing hard enough to break a pane of glass, yet he got batter after batter out easily. Lucky day, I thought. But after watching 8-10 outings, I realized Hideki Okajima (I know, a RP, but bear with me) had something, what I still don't know, probably hid the ball really, really well. But in any event, there's still a place for and value in "eyes on" when evaluating players, especially pitchers.
10. If a guy has all the good stuff above but is a head case? Stay away. Let him be your opponents' problem.

airforce21oneMember since 2026

Reply to Green Mountain Boy

How many pitchers meet all this criteria?

Reply to airforce21one

Granted, only future HOFers meet ALL of these criteria year after year, but if those guys were a dime a dozen, why would we play FBB? The trick is identifying which guys meet 8-7-6-5 of them and who will do so in 2022. THAT’S where FBB leagues are won and lost – in the decisions made in the middle and on the margins.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG