Thursday, August 21, 2014

 Here is the beginnings of my Sports statistics website.  I have a couple of things that I want to share.  Some of this, I wish to god I had done 12 years ago; I had the ideas that long ago, but wishing will not make it so.  Here is what I have now.

I would sincerely like to thank retrosheet.com for all of the info that they have made available, and of course to Sean Lahman for his invaluable baseball database.  All of the raw data I have comes from those sites.  I would also like to thank baseball-reference.com, who I plan to link to extensively.  And, of course, I would like to thank Bill James who came up with the original Runs Created formula.
And here are my 3 runs created formulas.  Why 3, you ask?  It is because, in my opinion, 3 are needed.  The first formula I created by looking at individual games played from the 2010, 2011 and 2012 seasons, and the 2nd was by looking at major league batting seasons from 1950 to 2013.  Both formulas were derived by setting coefficients for a runs created formula and comparing it with the actual runs created for each team by way of a correlation coefficient.  All of this work was done on a spread sheet with the help of a Macro.  My 3rd formula was derived by examining the pitching statistics for each team from 1950 to 2013 for reasons I will try to clarify in a moment.

The basics of my Formulas are:
(H + [2Bc]*2B + [3Bc]*3B + [HRca]*HR + [IBBc]*IBB + [Kc]*K + [SBc]*SB + [SHc]*SH + [SFc]*SF + [WPc]*WP + [Bc]*B + [BrEc]*BrE + [PBc]*PB + [Ac]*A]) *
([Hc]*H + [BBc]*BB + [HBPc]*HBP + [CSc]*CS + [CIc]*CI + [DPcb]*(DP-GIDP) + [TPc]*TP + [TOc]*TO + [EOBc]*EOB + [GIDPc]*GIDP + [OFAc]*OFA)/(PA+DP+2*TP+CS+CI) + [HRcb]*HR


H – Hits
2B- Doubles
3B – Triples
HR – Home Runs
BB – Walks aka Base on Balls
IBB – Intentional Base on Balls
K – Strikeout
SB – Stolen Base
CS – Caught Stealing
SH – Sacrifice Hit AKA Bunt
SF – Sacrifice Fly
WP – Wild Pitch
B – Balk
PB – Passed Ball
A – Assist
HBP – Hit by Pitch
DP – Total Double Plays by Defense
TP – Triple Plays
GIDP – Batter Ground into Double Play
OA – Outfield Assist
PA - Plate Appearences



TO – (Estimated) Runners Thrown out on the Basepaths
BrE – (Estimated) Error that allowed a baserunner to advance on the basepaths
EOB – (Estimated) Error that allowed a runner to reach 1st Base (and perhaps subsequent bases

Each of the 3 formulas has a different coefficient.  They are listed here:






RC1
RC2
RC3
H
0.999
1.026
1.018
D
0.721
0.723
0.749
T
1.345
1.675
2.169
HRa
-0.066
0.780
1.861
SH
-0.013
0.074
-0.251
SF
1.493
0.530
0.463
HBP
1.769
1.804
1.586
BB
1.633
1.729
1.631
IBB
-0.720
-0.865
-0.802
K
-0.083
-0.080
-0.087
SB
0.181
0.351
0.477
CS
-0.938
-0.330
-0.779
GIDP
-1.198
-1.017
-0.830
CI
-0.918
-2.241
-2.307
WP
0.463
0.701
0.893
B
0.308
0.811
0.354
A
0.020
-0.005
0.002
E
0.754
0.981
0.811
PB
0.873
1.155
0.959
DP
-1.247
-0.620
-0.059
TP
-1.011
-3.300
-9.655
EOB
1.115
1.099
1.397
TO
1.160
0.660
0.000
HRb
1.045
0.703
0.305
OFA
0.000
0.000
-1.076
CC
0.893
0.982
0.980


Last is listed the correlation coefficient for each different formula between the actual runs per game and the expected runs per game from the formula.  The first formula is for individual games, as opposed to a 108 to 162 game season.  When the totals listed for Formula 1 are applied to full seasons, the correlation coefficient is .979. 

Further observations: According to this chart in a single game a stolen base in a game is very clearly not worth it, adding about .075 runs if successful and -.325 runs if unsuccessful.  On the batting chart I estimate that for an average team a SB is worth .128 runs and a CS costs a team .192 runs, a much better apparent ratio for the base-stealer.  It is my working hypothesis that the first chart reflects the real value of a stolen base in a game, but, or should I say BUT, stolen bases and to some extent caught-stealings reflect a likelihood for a team to take an extra base during a season, an effect too subtle to be in the game-by game totals.  The players who steal 2nd,, and even the players who get caught stealing are likely more likely to take 3rd on a single to right field, and thus are more likely to create more runs for their teams.  Because of this, I have elected to use the 2nd chart in my runs per game formula for individual players, counting on the idea that it is the faster players who both attempt to steal bases and take extra bases during a game, and thus score more runs.  I stand by the idea, however that a player has to steal about 4.5 bases for every caught stealing to break even. 

Of course the highest SB coefficient found in the Pitching RC formula.  When I say the Pitching RC formula, the stats against stats are summed for each team using the Retro Sheet Game Logs.  The against-stats are fairly complete.  The fact that the Pitching RC formula has the highest number should mean that catchers play a large role in keeping players from taking extra bases, but that does not necessarily make sense.  Perhaps if I used stats from other pre-saber metrics season, the single game coefficients would more closely match the pitching coefficients.  I will run this experiment soon.
   
The reason that I made a pitching formula is that, since it is my working hypothesis that stolen bases don’t matter in a game but show a tendency for a team to take an extra base during the season, I felt that, since a pitcher faces different teams during the season, the speed of the team facing them would be a league average, and hence wouldn't matter.  Further, I was using a kind of a fudge factor to try to determine how many runners were thrown out on the bases during a game.  Any discrepancy between put outs and outs made by a team I decided were batters who were otherwise thrown out.  This, for me, was a proxy for outfield assists, but, if during a season fast runners took more bases and that lead to more runs, then good outfield arms should lead to less bases taken and less runs over the course of a season.  Also, over the course of a season, I had access to that data, so I included that, and removed the runners thrown out category.

No comments:

Post a Comment