Last week I laid out my plans for combining prospect grades and “scoutable” traits to help project major league performance. Finally, I’m able to output projections with encouraging results. Just by using traits people can scout with their eyes, I created a set of projections which competes with Steamer projections. Additionally, it helps point to the traits people should look for in prospects.
Previously, I tried to use just the five traits prospects get graded on (Hit, Power, Speed, Field, and Arm) to come up with a player’s value. I found the Speed and Power grades useful but came to the following conclusion on the Hit grade:
Basically, the Hit tool is a useless component to determine hitter value as it’s currently being distributed.
The more I thought about the Hit tool, the more I concluded that it’s trying to evaluate too much information (examples of different Hit tool definitions).
For these projections, I matched up the traits hitters display with common stats. To start with, here are the core traits I decided to utilize:
Prospect Grades: I used just future grades since most evaluations only have these grades. For now, I concentrated just on the offensive stats and ignored Field and Arm (FYI: I’ve done a bit of research and found a strong arm may be a hidden indicator of untapped power like with Francisco Lindor).
- Power (Raw): Raw power is easy for scouts to see. It’s how far the ball travels and I previously found its value to be predictive.
- Speed: While normally associated with just stolen bases, speed can help hitters get on base and contribute to their power stats.
- Hit: As previously stated, I’m not sure it has a precise industry definition.
“Scoutable” Traits: These are traits in which people can see with their eyes. The stats may be detailed or a best guess proxy.
- Ground ball%: While launch angle is finally available for major league hitters, the data is not publically available for minor leaguers. Groundball rate is available and correlates greatly with launch angle. A high groundball rate for a high-power batter indicates that the hitter may need to rework their swing to harness more power.
- Swing%: How often does the hitter swing at pitches. Fairly simple.
- Contact%: Does the hitter make contact when he swings. At FanGraphs, we only track balls-in-play and swinging-strikes for hitters. No fouls. I determined Contact% and tracked balls Contact% at the major league level and found an r-square of .80. I went ahead and used this non-foul ball contact rate but if the real value becomes available, I could switch to it.
- HR/FB: With the Power grade, I debated needing another power metric. I used both for two reasons. First, to see how the Power grade holds up. Second, I wanted a stat which could be a placeholder if minor league exit velocity becomes available.
- Stolen Bases/Opportunity (SB/(1B*0.8+(BB+HBP)*0.6)): I wanted to use Speed Score but found it was not predictive in any way. This limitation with Speed Score has put me down another tangent to help create a better version with the help of the new StatCast data. Instead of Speed Score, I decided to use the stolen base rate equation (which Tom Tango determined to be an approximate steal rate at his blog but I can’t seem to find the link). I didn’t want total stolen bases as the hitter needs to be on base first to steal.
Just eight total inputs with a few measuring similar values. With these eight values, I created three year stat specific weights and adjusted for league level. For now, all my outputs are not regressed or age-adjusted. I need the projections to be raw for each additional stat. Once I have found all the outputs I want, I can age and regress the values. Because of no regression, some projections may be too high or low.
For the Tool grades, I used the 2013 and 2014 prospect grades from MLB.com (thanks to @thisisgump for help collecting the grades). While I wanted to use more prospects, I needed as many varied prospects getting to the majors and having an MLB career, not just a few ready stars. Of the ~300+ hitting prospects listed in 2013, only 72 have debuted in the majors. For these players, I used the inputs to calculate the common stats.
Here is the original flow chart showing the expected inputs for each stat.
In many cases, not all the information was useful (mainly speed) and in other cases, I added unexpected metrics (power). Here are the traits I projected with the inputs I found important.
Stat (inputs): Notes
- ISO: (Power and HR/FB): Best stat to project.
- BB% (Swing% and HR/FB): I didn’t expect power to be an input but adding it double the accuracy. It makes sense for power hitters to be walked more as pitchers want to limit their production. Hitters with no power see an opposite effect with pitchers attacking them since they have limited extra base potential.
- K%: (Hit, Contact%, and HR/FB): Again, HR/FB shows up. After doing some work (not published yet), higher launch angles and exit velocities increase a hitter’s strikeout rate. With a higher launch angle, the bat is not in the hitting zone as long. Also, a faster swing will have less time in the hitting zone. While both help with home runs, they will lead to more missed balls and therefore strikeouts.
- BABIP (GB%, SB/Opp, and HR/FB): What a disaster but I expected one with BABIP nearly impossible to project, even at the major league level. The Hit tool was useless here as a negative factor for determining BABIP. The higher the Hit tool, the lower the BABIP. And while the three factors used were significant, they barely moved the final value away from the league average.
- AVG: (GB%, Hit, and K%): While a person would think BABIP should be an input to AVG, the heavily centered value was useless. While K% was a major input, the rest of the values aren’t that significant, again, because of the inability to project BABIP.
Special note: While the Hit tool was better than Contact% for projection AVG, both had similar effects. I did a bit of checking and the Hit tool correlates the best, while not great, with Contact%. Nothing else is close. A person could consider the Hit tool as a Contact% proxy but that’s it.
- OBP (BB% and AVG): The projections cleaned up again.
- wOBA (ISO and OBP): A nice projection.
With these projections, I ran a few tests against historic Steamer projections. First, I ran the values using the grades the projections are based on against Steamer. Since the grades created the projections, the values should have held up (mine aren’t aged or regressed yet) and they did. With the value not adjusted to a league average, the prospect projections (min 200 PA) for wOBA had a RMSE of .299 and Steamer was at .271. When I adjusted for the league average, I was at .295 and Steamer at .299.
Then, I decided to see how the system held up against another dataset. I used Eric Longenhagen’s prospect grades from this last season. Only 35 hitters qualified. Again, these prospect projections performed better than Steamer without adjusting to league changes with a RMSE of .299 compared to .335. Now, when the values were adjusted to the league average, Steamer performed better with a .0396 RMSE while the prospect projections jumped to .0464.
I was mainly off with the walk projections. Aaron Judge and Rhys Hoskins and their +17% BB% threw everything out of whack. They had historically high rookie walk rates. I will miss for now on them but in a few years, their values will be included in the output.
For reference, here is how the prospect projection system performed compared to the actual values for these 35 hitters ordered by projected wOBA.
|Albert Almora Jr.||14%||2%||24%||16%||6%||6%||0.310||0.338||0.234||0.298||0.305||0.338||0.161||0.147||0.308||0.334|
With the know results, here is an list of limitations for the projections:
- The system is made to project hitters in their first few seasons. For the few hitters with four-plus seasons of major league data, these projections become useless as the scouting reports become dated.
- I didn’t adjust players for park or league (I did for their level). I plan on adding them but I just wanted a beginning-to-end process which can eventually be tweaked.
- The sample size is small with so few players making it to the majors. When each season ends, more players can be included. Additionally, I will be able to add grades from Baseball America and 2080 as the prospects they grade graduate to the majors.
- It’s going to be a fluid system as I could see just adding one year’s worth of grades throw many of the factors off. At some point in the future, I could also adjust the final values depending on the grade’s source.
- Instead of just the eight measures, maybe actual minor league stats like K% and BB% could be incorporated into generating the projections for more accuracy. Not for now though.
Overall all, I combined eight-hitter physical traits which a person can generally observe and created a projection system which holds up against one of the industry’s best. It needed to stand up to help show what traits for scouts, fans, and even fantasy owners can investigate. While Power and Speed grades are useful, the Hit grade still isn’t. Instead, several factors determine a position player’s hitting potential and therefore overall value. Not just his ability to make contact. The system helps to determine which prospect traits to focus on when setting their value.
That’s it for now. Truthfully, I’m a little tired of examining the data and I need to tighten up the process a bit. Overall, happy with the first run. Please let me know if you have any questions. I’m sure I’ve missed an improvement or error along the way.
Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won three FSWA Awards including on for his MASH series. In his first two seasons in Tout Wars, he's won the H2H league and mixed auction league. Follow him on Twitter @jeffwzimmerman.