Here is the
beginnings of my Sports statistics website.
I have a couple of things that I want to share. Some of this, I wish to god I had done 12
years ago; I had the ideas that long ago, but wishing will not make it so. Here is what I have now.
I would sincerely like to thank retrosheet.com for all of
the info that they have made available, and of course to Sean Lahman for his
invaluable baseball database. All of the
raw data I have comes from those sites.
I would also like to thank baseball-reference.com, who I plan to link to
extensively. And, of course, I would
like to thank Bill James who came up with the original Runs Created formula.
And here are my 3 runs created formulas. Why 3, you ask? It is because, in my opinion, 3 are
needed. The first formula I created by
looking at individual games played from the 2010, 2011 and 2012 seasons, and
the 2nd was by looking at major league batting seasons from 1950 to
2013. Both formulas were derived by
setting coefficients for a runs created formula and comparing it with the
actual runs created for each team by way of a correlation coefficient. All of this work was done on a spread sheet
with the help of a Macro. My 3rd
formula was derived by examining the pitching statistics for each team from 1950
to 2013 for reasons I will try to clarify in a moment.
The basics of my Formulas are:
(H + [2Bc]*2B + [3Bc]*3B + [HRca]*HR + [IBBc]*IBB + [Kc]*K +
[SBc]*SB + [SHc]*SH + [SFc]*SF + [WPc]*WP + [Bc]*B + [BrEc]*BrE + [PBc]*PB +
[Ac]*A]) *
([Hc]*H + [BBc]*BB + [HBPc]*HBP + [CSc]*CS + [CIc]*CI +
[DPcb]*(DP-GIDP) + [TPc]*TP + [TOc]*TO + [EOBc]*EOB + [GIDPc]*GIDP +
[OFAc]*OFA)/(PA+DP+2*TP+CS+CI) + [HRcb]*HR
H – Hits
2B- Doubles
3B – Triples
HR – Home Runs
BB – Walks aka Base on Balls
IBB – Intentional Base on Balls
K – Strikeout
SB – Stolen Base
CS – Caught Stealing
SH – Sacrifice Hit AKA Bunt
SF – Sacrifice Fly
WP – Wild Pitch
B – Balk
PB – Passed Ball
A – Assist
HBP – Hit by Pitch
DP – Total Double Plays by Defense
TP – Triple Plays
GIDP – Batter Ground into Double Play
OA – Outfield Assist
PA - Plate Appearences
PA - Plate Appearences
TO – (Estimated) Runners Thrown out on the Basepaths
BrE – (Estimated) Error that allowed a baserunner to advance on the
basepaths
EOB – (Estimated) Error that allowed a runner to reach 1st
Base (and perhaps subsequent bases
Each of the 3 formulas has a different coefficient. They are listed here:
RC1
|
RC2
|
RC3
|
|
H
|
0.999
|
1.026
|
1.018
|
D
|
0.721
|
0.723
|
0.749
|
T
|
1.345
|
1.675
|
2.169
|
HRa
|
-0.066
|
0.780
|
1.861
|
SH
|
-0.013
|
0.074
|
-0.251
|
SF
|
1.493
|
0.530
|
0.463
|
HBP
|
1.769
|
1.804
|
1.586
|
BB
|
1.633
|
1.729
|
1.631
|
IBB
|
-0.720
|
-0.865
|
-0.802
|
K
|
-0.083
|
-0.080
|
-0.087
|
SB
|
0.181
|
0.351
|
0.477
|
CS
|
-0.938
|
-0.330
|
-0.779
|
GIDP
|
-1.198
|
-1.017
|
-0.830
|
CI
|
-0.918
|
-2.241
|
-2.307
|
WP
|
0.463
|
0.701
|
0.893
|
B
|
0.308
|
0.811
|
0.354
|
A
|
0.020
|
-0.005
|
0.002
|
E
|
0.754
|
0.981
|
0.811
|
PB
|
0.873
|
1.155
|
0.959
|
DP
|
-1.247
|
-0.620
|
-0.059
|
TP
|
-1.011
|
-3.300
|
-9.655
|
EOB
|
1.115
|
1.099
|
1.397
|
TO
|
1.160
|
0.660
|
0.000
|
HRb
|
1.045
|
0.703
|
0.305
|
OFA
|
0.000
|
0.000
|
-1.076
|
CC
|
0.893
|
0.982
|
0.980
|
Last is listed the correlation coefficient for each different
formula between the actual runs per game and the expected runs per game from
the formula. The first formula is for
individual games, as opposed to a 108 to 162 game season. When the totals listed for Formula 1 are
applied to full seasons, the correlation coefficient is .979.
Further observations: According to this chart in a single
game a stolen base in a game is very clearly not worth it, adding about .075
runs if successful and -.325 runs if unsuccessful. On the batting chart I estimate that for an
average team a SB is worth .128 runs and a CS costs a team .192 runs, a much
better apparent ratio for the base-stealer.
It is my working hypothesis that the first chart reflects the real value
of a stolen base in a game, but, or should I say BUT, stolen bases and to some
extent caught-stealings reflect a likelihood for a team to take an extra base
during a season, an effect too subtle to be in the game-by game totals. The players who steal 2nd,, and
even the players who get caught stealing are likely more likely to take 3rd
on a single to right field, and thus are more likely to create more runs for
their teams. Because of this, I have
elected to use the 2nd chart in my runs per game formula for
individual players, counting on the idea that it is the faster players who both
attempt to steal bases and take extra bases during a game, and thus score more
runs. I stand by the idea, however that
a player has to steal about 4.5 bases for every caught stealing to break
even.
Of course the highest SB coefficient found in the Pitching
RC formula. When I say the Pitching RC
formula, the stats against stats are summed for each team using the Retro Sheet
Game Logs. The against-stats are fairly
complete. The fact that the Pitching RC
formula has the highest number should mean that catchers play a large role in
keeping players from taking extra bases, but that does not necessarily make
sense. Perhaps if I used stats from
other pre-saber metrics season, the single game coefficients would more closely
match the pitching coefficients. I will
run this experiment soon.
The reason that I made a pitching formula is that, since it
is my working hypothesis that stolen bases don’t matter in a game but show a
tendency for a team to take an extra base during the season, I felt that, since
a pitcher faces different teams during the season, the speed of the team facing
them would be a league average, and hence wouldn't matter. Further, I was using a kind of a fudge factor
to try to determine how many runners were thrown out on the bases during a
game. Any discrepancy between put outs
and outs made by a team I decided were batters who were otherwise thrown
out. This, for me, was a proxy for outfield
assists, but, if during a season fast runners took more bases and that lead to
more runs, then good outfield arms should lead to less bases taken and less
runs over the course of a season. Also,
over the course of a season, I had access to that data, so I included that, and
removed the runners thrown out category.
No comments:
Post a Comment