Wednesday, September 3, 2014

Writing this blog, I intend to continue to explain exactly how and where I get my stats from but I also want to get started on one of the more fun things I have in mind, which is to blog every single season of Major League Baseball from 1871 to the present.  The rules in previous years, was, of course, not the same as it is today, but the statistics record is complete enough for my purposes.  In 1871 they did not use fielding gloves, and made many more errors than players are credited for today, but they kept track of who made the errors.  The pitchers threw underhand and pitched every game unless they were injured, but they still recorded who was pitching and what the stats were against them.  The statistical rules for deciding who were the best and players seem to apply to that time almost as well as today. 

As appears on my website (but not this blog) all stats are adjusted for quality of play, and my way to judge quality of play is by pitcher’s batting average.  It is assumed that pitchers, in terms of batting represent average batters, in that if you or I (at the ages of 20-40) were to train every day to hit a baseball, and were to attempt to hit a baseball 1 day in 5, and the person reading this is not a person who has ever been paid to hit a baseball, we would perform about the same as major league pitchers.  This assumption bears out pretty well.  We do not see the dotted I graph that we see when we look at batter’s statistics with a mode high above a normal appearing graph, but that has a steeper slope before the dot of the I and a less steep right portion.  An example of a dotted i. selection graph appears in the previous post.  This chart is also in the previous post.

Another measure of quality of play is standard deviation of the players in the league.  As the quality of play increases, the difference between the average and elite player decreases and the standard deviation becomes smaller.  Here is Pitching RPO – unadjusted (for the numbers I use, pitcher ROP is set to 0) with standard deviation graphed together.  The correlation coeficent of the 2 scores (Player RPG for players with 250+ AB and all pitcher hitting) is .8601.

The Blue is Pitcher's Runs Per Game and the orange is Batter's Standard Deviation for players with 250+ At Bats.

As I said parenthetically, the way that I adjust for this is to set Pitchers RPOs to 0 and so subtract the pitcher RPO for each player for each season by a pitcher’s average RPO.  The results of this is that Babe Ruth still one of the top season of all time in 1923 (behind a couple of pre 1900 seasons and Barry Bond’s 2002) and the top seasons are a good mix of old-time and new seasons, as one might expect if one is picking evenly between top players who are 5+ standard deviations from normal. (normal being pitchers, not other batters) If one selects 2196 of the best batting seasons of all time (Giving one 15.43 batters per season.  The seasons of the 2196 are clustered near the present (There are more people in the United States today and more people in the world who want to play Professional Baseball, so this is also what one would expect.  There are less players and the game was less relatively popular, and so there are far fewer top season in the past.  When I select my all-star teams, this is reflected by those numbers.  The number of players I select will be a rolling average of the number of top seasons. (Except for the years 1943-45 when the quality of play did go down.  I will take an average for these seasons.) 

I do not at this time adjust pitcher’s records because I have no evidence that pitchers have become better over time.  I guess that is a subject for another blog post.

No comments:

Post a Comment