Estimating Talent Level With a Small Sample Size

When a hitter comes back from the DL our natural inclination is to compare their current performance to their previous performance to see if their talent level has changed. A small sample size of data is used in this exercise, which makes it tough to figure out how much to weigh the new data. To help with this problem, I have created a spreadsheet to take a small sample of a hitter’s stats and estimate their ability.

Last week, I looked at how Chase Utley has done since returning from the DL. Some discussion in the comments took place on how it was a small sample size of data and how to weigh it. I have decided to use to create a quick calculator/tool/spreadsheet that helps to estimate the hitter’s talent knowing just a small amount of data. This estimate uses none of the hitter’s past known ability. Instead it assumes the hitter has an unknown talent level and regresses the values significantly to the league average. Finally the tool estimates the xBABIP (0.154*OFFB%+0.235*GB%+0.004*IFFB%+0.727*LD%), HR total, AVG (uses the HR total, xBABIP, K% and BB% to estimated the value), OBP, SLG and ISO with the given data. Basically, this estimator treats the player like a new player. It helps us sort out luck and small sample issues in order to compare the ‘new’ version to the player before the injury.

I used Russell Carleton’s (Pizza Cutter) previous work which uses the hitter’s data that we know and then estimates their talent level. I am not going to go through all the background information on the calculations and values, but the it can be found here and here and here and here.

For an example, I will go back and look at Utley’s data to see what his talent level is since coming back from the DL. First, here are his stats needed for the tool (the 2011 league values are already added to the worksheet).

Stat Value
PA 94
K% 12.5%
BB% 12.8%
LD% 14.3%
GB% 32.9%
FB% 52.9%
IFFB% 5.4%
HR/PA 0.032
OBP 0.370
SLG 0.420
ISO 0.159

The worksheet then calculates what Utley’s estimated talent level is now given the number of plate appearances so far this season. Here are the estimated and actual season stats:

Stat Estimated Talent Season Stats
BABIP 0.268 0.271
HR 2.4 3
AVG 0.237 0.275
BB% 9.9% 12.8%
K% 17.4% 12.5%
OBP 0.328 0.370
SLG 0.396 0.420
ISO 0.141 0.159

Some values in both columns line up fairly nice such as home runs and BABIP. The similarities end there. The main cause of the differences in ISO, SLG, OBP and AVG all go back to BB% and K%. Both of Utley’s values are significantly better than the league average and need to be regressed back to the league average quite a bit.

Just using the small amount of data so far this season, he seems to be over performing his expected talent level in a few stats. As he get more plate appearances, his true talent level will become more and more apparent. The tool can have several uses when looking at small sample to give a person an idea of a player’s ability without having to guesstimate the amount of regression for different stats.





Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
troy
12 years ago

Dont you think a tallent level expectation should be a range based upon the accuracy of the small sample size. So if the Sample size is accurate +/- 15% then that would give you an expected range and as the sample size decreases the range should tighten?

mcbrown
12 years ago
Reply to  troy

Yes. Put differently, the question seems to call for an error band, while the model simply calculates a weighted average of actual performance with league average performance.