Saturday, August 23, 2014




Talent in baseball is not normally distributed. It is a pyramid. For every player who is 10 percent above the average player, there are probably twenty players who are 10 pecent below average. -  Bill James.

Note RPG is Runs Per Game, and is based my formula discussed in a previous blog post.




I love the work of Bill James, but I believe that this is one of the dumbest things that he has ever said. I believe that he is correct about the distribution of talent of players in the major leagues, and yet I believe that baseball talent is distributed normally FOR HUMANS ON EARTH. In fact, if you pay players millions of dollars to play a game that is considered fun so that almost any person on earth who could play the game would want to, what you will end up with is a group of people who were +4.25 on the normal curve. What, if this were true, would the talent distribution look like in major league baseball (or any other sport where the players are paid millions of dollars for that matter) look like? It would look like a pyramid. For every player who was 10 percent above average, there would be 20 players who were 20 percent below average. The fact that this is true does not dispute that baseball talent is normally distributed (among humans on earth), it helps to prove it.

Here is the Distribution of Batting RPG using my Formula. It batters from 1981-2013 with more than 250 At Bats and are not Pitchers.



Here is the same formula with a trend line-like formula going from 4.5 standard deviations to 5.29 standard deviations starting at the highest point on this chart.



There is not the hard cut off one might expect and hope for, but players play hurt, managers and GMs are not perfect judges of talent, players don’t play up to their talent level due to chance. Also, this is just batting that the players are being rated on.  Some of the players below the .5 RPG mark are Mark Belanger and Ozzie Smith.  The Max distribution is at .5 RPG, and is players with .5 and .51 runs per game. There are 271 Player seasons at that level, 5099 above it and 3155 below that point. It looks a little “normal-y” but I stand by Bill Jame’s analysis and my own.




If you want to see something that looks a bit more like a normal distribution look at RPG for pitchers all time:


Friday, August 22, 2014

Adjusted Range Factor

Who was the most valuable defensive player in 2013 during the regular season?

Andrelton Simmons, with Pedro Florimon a close 2nd; next question.  But can we quantify that?  I think that we can, and I think it can be done in an objective manner.  I think it can be done relying entirely on the statistics, without resorting to relying on anybody’s judgment.  Further I think that we can go into history and find out who was the best fielder of all time (Nap Lajoie, according to the stats I have, Bill James be damned) and what kind of a fielder was Babe Ruth really?  (Much much much better than you have been lead to believe)
I have read that there is good statistical proof that good defense keeps balls that are in play (Base Hits that are not home runs and outs that are not strike outs) from being hits and not pitching.  That is, there is proof that it is not a skill that appears constant from year to year for pitchers.  (The exception is apparently knuckleballers.)  I have also read that a good stat for total team defense is outs/balls in play:
Defensive Outs (DO) = IP*3-K
Balls in Play (BIP) = H-HR+IP*3-K +E
Team Defensive Rating = (DO)/(BIP)

The team that led the league in that statistic in 2013 was the Cincinnati Reds.  They converted .723 of batted balls into outs.  The worst team was The Colorado Rockies, who converted in-play balls into outs .684 percent of the time.  The league average was .701.  There are, however, 27 outs in a game, and Cincinnati and Colorado both managed get 27 batters out in most of their games.  One would expect, however, that if Colorado’s shortstop and Cincinnati’s shortstop got to 4.16 balls a game (the league average) Cincinnati’s had the better shortstop, but by how much?  Cincinnati’s shortstop’s range factor should be multiplied by 1.031 (=.723/.701) and Colorado’s shortstop’s range should be multiplied by .976 (=.684/.701) giving Colorado’s shortstop a range rating of 4.06 and Cincinnati’s shortstop 4.29.  If each shortstop played in an otherwise average defense, each would be expected to reach that many balls per game.  (Assists + Putouts – Errors) 

Strikeouts are a 2nd factor that must be adjusted for.  This season Detroit struck out the most batters with .234 percent of the batters they faced and so did not strike out .766 percent of the batters that they faced.  Minnesota (last place) struck out .158 percent of the batters that they faced and failed to strikeout .842 percent of the batters that they faced.  The league average was .800.  Minnesota fielders had to make more plays than they would have if they were behind a pitching staff that struck out more batters; their stats are a bit padded (.800/.842 = .950) and Detroit did not get as many chances as they might have otherwise – (.800/.762 = 1.050)  The number I actually go with, however is 1-(K)/(H+IP83) because I have a complete set of that data, and do not have complete data for batters faced by pitchers.
A Third factor is the ball park that a team plays in.  When I conceived of the idea that a ball park needs to be adjusted for, I conceived of it as the Colorado adjustment.  The field at Colorado is the highest in square footage in the league, and in spite of that, more homeruns are hit there than in any other park due to the low air density, blah blah blah, if you are reading this you are likely familiar with the problem.  It occurred to me that if the fielders have more square footage to cover then it must be harder to stop hits in the park as well and the statistics bore this out.  In an average year teams playing at Coors Field make about .96 as the (approximately) same teams on the road.  What did not occur to me, however, is that there might be stadiums that had a higher Outs per Ball in play than others.  Stadiums with large foul territories do, in fact show such an effect.  I feel confident that, in order, the stadiums with the largest foul territories in baseball history are “The Polo Grounds”, “Braves Field” in Boston, “Qualcom Stadium” and the old “Yankee Stadium”, and I feel so confident of that that I will not fact check it.
A fourth and final factor to adjust for is total infield vs. outfield chances.  If a pitching staff consisted of Justin Masterson, A.J. Burnett, Doug Fister and Rick Porcello, all with GB/FB ratios of 1.28 or above, then the outfield is going to look pretty bad and the infield is going to have their stats padded.  It could, however, be that the infield is good and the outfield is bad.  I go with a middle of the road adjustment for this.  I divide Outfield Putouts by Putouts minus Strikeouts (OFPO/(PO-K) for teams and the league to gauge the amount of balls that go to the infield vs. balls that go to the outfield and compare it to the league average.  By this rating Oakland had the highest rating (meaning more balls went to the outfield) of ..382 and Pittsburg had the lowest rating of ..227.  As I said, I go half way with this adjustments, so Oakland’s infielders have an adjustment of (.382/.3285+1)/2 =1.081 and their outfielders have an adjustment of (.3285/.382+1)/2=.960

Here are the defenses rank ordered with the final adjustments that I make:

yearID
teamID
OIP
fAdj
Kadj
Park
OFO/O
IFA
OFA
2013
ARI
0.7062
1.0072
0.9910
1.0072
0.3064
0.9576
1.0073
2013
ATL
0.7084
1.0103
1.0063
0.9990
0.3161
0.9985
1.0271
2013
BAL
0.7121
1.0155
0.9870
1.0021
0.3210
0.9887
1.0058
2013
BOS
0.7040
1.0040
1.0179
0.9947
0.3413
1.0474
1.0178
2013
CHA
0.6961
0.9928
1.0049
1.0107
0.3331
0.9940
0.9837
2013
CHN
0.7130
1.0169
0.9956
0.9907
0.3430
1.0445
1.0109
2013
CIN
0.7227
1.0307
1.0191
1.0022
0.3315
1.0528
1.0457
2013
CLE
0.6926
0.9877
1.0406
1.0020
0.3232
1.0175
1.0298
2013
COL
0.6840
0.9755
0.9630
0.9879
0.2904
0.8957
0.9779
2013
DET
0.6921
0.9870
1.0479
0.9928
0.3355
1.0528
1.0364
2013
HOU
0.6901
0.9842
0.9663
1.0049
0.3263
0.9432
0.9479
2013
KCA
0.7076
1.0093
0.9993
1.0005
0.3573
1.0522
0.9864
2013
LAA
0.6908
0.9853
0.9914
1.0007
0.3464
1.0025
0.9631
2013
LAN
0.7007
0.9994
1.0194
1.0073
0.2907
0.9531
1.0399
2013
MIA
0.7046
1.0049
0.9905
0.9978
0.3400
1.0149
0.9890
2013
MIL
0.7050
1.0055
0.9805
0.9993
0.3356
0.9972
0.9814
2013
MIN
0.6917
0.9866
0.9442
0.9961
0.3241
0.9290
0.9383
2013
NYA
0.6976
0.9949
1.0007
0.9893
0.3493
1.0382
0.9908
2013
NYN
0.6998
0.9981
0.9921
1.0220
0.3285
0.9687
0.9688
2013
OAK
0.7137
1.0179
0.9945
1.0096
0.3820
1.0842
0.9627
2013
PHI
0.6859
0.9782
0.9941
1.0108
0.3255
0.9576
0.9643
2013
PIT
0.7076
1.0092
1.0110
1.0074
0.2771
0.9334
1.0516
2013
SDN
0.7054
1.0060
0.9885
1.0181
0.3212
0.9659
0.9821
2013
SEA
0.6917
0.9865
1.0118
0.9963
0.3318
1.0068
0.9993
2013
SFN
0.6969
0.9940
1.0095
1.0148
0.3461
1.0152
0.9759
2013
SLN
0.7016
1.0007
1.0081
1.0090
0.3127
0.9757
1.0115
2013
TBA
0.7164
1.0218
1.0225
1.0111
0.3228
1.0244
1.0378
2013
TEX
0.7034
1.0032
1.0197
0.9950
0.3430
1.0507
1.0171
2013
TOR
0.6972
0.9944
0.9949
0.9977
0.3335
0.9991
0.9879
2013
WAS
0.6995
0.9977
1.0059
0.9863
0.3222
1.0077
1.0224

Avg
0.7012
1.0000
1.0000
1.0000
0.3285
1.0000
1.0000

OIP – Outs per Ball in Play
Fadj – Fielding adjustment
Kadj – Strike out Adjustment
GB – OFO/O – Out field Put Outs divided by Putouts – Strikeouts
IFA = Fadj*Kadj/Park*(OFO/AvgOFO+1)/2
OFA = Fadj*Kadj/Park*( AvgOFO /OFO+1)/2

And here are your rightful 2013 Gold Glove Winners (Voter Results may Vary)

American

National
C
Yan Gomes

Russell Martin
1B
Adam Rosales

Joey Votto
2B
Eric Sogard

Brandon Phillips
3B
Manny Machado

Nolan Arenado
SS
Stephen Drew

Andrelton Simmons
LF
Andy Dirks

Denard Span
CF
Chris Young

Carlos Gomez
RF
Shane Victoino

Juan Lagares

ARF – Adjusted Range Factor
Notes: I did not necessarily give my Golden Glove vote to the player with the highest Range Factor, but instead to the player with the highest defensive runs saved.  How I calculate this will be another blog post.

One of the values of using this method is that it is almost completely objective.  It still relies on the opinion of the official scorer as to what is an error in its calculations.  The other value is that there are enough stats to make such evaluations back to 1955 and enough stats to make some pretty good guesses back to 1871.  Estimating the amount of Innings played is slightly fraught and prone to some error, but the errors for full time players are apt to be fairly small.  I have made such guesses back to 1871.  Since I mention it earlier:

1923 Gold Glove Winners: (According to me)

American
National
C
Muddy Ruel
Zack Taylor
1B
Joe Judge
Charlie Grimm
2B
Aaron Ward
Jimmy Johnston
3B
Rube Lutzke
Pie Traynor
SS
Dave Bancroft
Rabbit Maranville
LF
Baby Doll Jacobson
Jack Smith
CF
Johnny Mostil
Jigger Statz
RF
Babe Ruth
Max Carey


Note: These totals are based on estimated Innings played.  This will also require a Blog Post.