Estimating Talent Level With a Small Sample Size

When a hitter comes back from the DL our natural inclination is to compare their current performance to their previous performance to see if their talent level has changed. A small sample size of data is used in this exercise, which makes it tough to figure out how much to weigh the new data. To help with this problem, I have created a spreadsheet to take a small sample of a hitter’s stats and estimate their ability.

Last week, I looked at how Chase Utley has done since returning from the DL. Some discussion in the comments took place on how it was a small sample size of data and how to weigh it. I have decided to use to create a quick calculator/tool/spreadsheet that helps to estimate the hitter’s talent knowing just a small amount of data. This estimate uses none of the hitter’s past known ability. Instead it assumes the hitter has an unknown talent level and regresses the values significantly to the league average. Finally the tool estimates the xBABIP (0.154*OFFB%+0.235*GB%+0.004*IFFB%+0.727*LD%), HR total, AVG (uses the HR total, xBABIP, K% and BB% to estimated the value), OBP, SLG and ISO with the given data. Basically, this estimator treats the player like a new player. It helps us sort out luck and small sample issues in order to compare the ‘new’ version to the player before the injury.

I used Russell Carleton’s (Pizza Cutter) previous work which uses the hitter’s data that we know and then estimates their talent level. I am not going to go through all the background information on the calculations and values, but the it can be found here and here and here and here.

For an example, I will go back and look at Utley’s data to see what his talent level is since coming back from the DL. First, here are his stats needed for the tool (the 2011 league values are already added to the worksheet).

Stat Value
PA 94
K% 12.5%
BB% 12.8%
LD% 14.3%
GB% 32.9%
FB% 52.9%
IFFB% 5.4%
HR/PA 0.032
OBP 0.370
SLG 0.420
ISO 0.159

The worksheet then calculates what Utley’s estimated talent level is now given the number of plate appearances so far this season. Here are the estimated and actual season stats:

Stat Estimated Talent Season Stats
BABIP 0.268 0.271
HR 2.4 3
AVG 0.237 0.275
BB% 9.9% 12.8%
K% 17.4% 12.5%
OBP 0.328 0.370
SLG 0.396 0.420
ISO 0.141 0.159

Some values in both columns line up fairly nice such as home runs and BABIP. The similarities end there. The main cause of the differences in ISO, SLG, OBP and AVG all go back to BB% and K%. Both of Utley’s values are significantly better than the league average and need to be regressed back to the league average quite a bit.

You Aren't a FanGraphs Member
It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won't bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

Just using the small amount of data so far this season, he seems to be over performing his expected talent level in a few stats. As he get more plate appearances, his true talent level will become more and more apparent. The tool can have several uses when looking at small sample to give a person an idea of a player’s ability without having to guesstimate the amount of regression for different stats.





Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
troy
14 years ago

Dont you think a tallent level expectation should be a range based upon the accuracy of the small sample size. So if the Sample size is accurate +/- 15% then that would give you an expected range and as the sample size decreases the range should tighten?

mcbrown
14 years ago
Reply to  troy

Yes. Put differently, the question seems to call for an error band, while the model simply calculates a weighted average of actual performance with league average performance.