Where Can I Find L/R Splits in Bulk?

djbayko

Member
SoSH Member
Jul 18, 2005
25,895
Los Angeles, CA
Mods: Please feel free to move this to a more appropriate forum. This question doesn't feel right in this sub, but I honestly don't know where else to put it. I know the answer to my question must be here at SoSH.

For a side project I'm working on, I'm looking to download:
  • batter and pitcher L/R splits
  • for the past 3 seasons (negotiable)
  • preferably advanced stats over traditional (negotiable)
I know splits are available at several sites, including baseball-reference.com, but they are only listed per individual player page, as far as I can tell. I'm looking to compile this data from a list so that I don't have to to click around and copy/past many hundreds of times.

Any ideas?
 

djbayko

Member
SoSH Member
Jul 18, 2005
25,895
Los Angeles, CA
http://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2016

That is just for batting by MLB for 2016 but you can also find a link to pitching on that page. And just change the Year=2016 in the URL to the year(s) you want. Be sure to credit baseball-reference.com if you publish.
I actually found that link, but I didn't notice until now that you can drill down into the league stats to get team stats and then individual player-level stats.(which is what I'm after).

90 separate copy / paste actions is still a bit cumbersome, but much more manageable than copying each player's stats individually.

Thanks!
 

djbayko

Member
SoSH Member
Jul 18, 2005
25,895
Los Angeles, CA
Shit, I just realized it's 180 copies, not 90...3 years * 30 teams * (pitchers + batters).

Better than nothing, I guess :)

Im still interested in split data in a more denormalized / flat format, if anyone knows of a source,
 

charlieoscar

Member
Sep 28, 2014
1,339
Which seasons are you seeking? If they are after 1973, you can download Play-by-play files from Retrosheet (before that there are games missing) and run them through BEVENT.exe, which converts them to CVS format and use them in a database (the more SQL you know, the better). You can work with them using a programming language but you really have to understand their file system. The files and some information about using them can be found here: http://www.retrosheet.org/game.htm. Chadwick has a fuller version but I haven't used that. I have been using Retrosheet data for a long time with MS Access and its Query Builder. Among things of interest to you would be the fields of:

Responsible Batter
Responsible Batter Hand
Responsible Pitcher
Responsible Pitcher Hand
Batter Event Flag (set it to "T")
Event Type (number from 0 to 24, which tells if the PA resulted in generic out, strikeout,...,walk, intentional walk (separate), ...,error,...,home run, and so forth.

There are also fields 97 in all) that tell you things like runner on 1st,2nd, etc.; destination of batter, runner on 1st, etc.; SB, CS, assists, putouts, and the like.

The main problem when using these files with a database is that it is not easy to connect plays. Say that you want to look at whether runners who steal a base later score. If the lead-off batter gets to first then steals second, two more batters could reach base without his scoring and two more batters could make outs. Or, he could go to third on the throw to second and score on a wild pitch to the next batter. You could figure this out with SQL but it would be extremely time consuming.
 

djbayko

Member
SoSH Member
Jul 18, 2005
25,895
Los Angeles, CA
Which seasons are you seeking? If they are after 1973, you can download Play-by-play files from Retrosheet (before that there are games missing) and run them through BEVENT.exe, which converts them to CVS format and use them in a database (the more SQL you know, the better). You can work with them using a programming language but you really have to understand their file system. The files and some information about using them can be found here: http://www.retrosheet.org/game.htm. Chadwick has a fuller version but I haven't used that. I have been using Retrosheet data for a long time with MS Access and its Query Builder. Among things of interest to you would be the fields of:

Responsible Batter
Responsible Batter Hand
Responsible Pitcher
Responsible Pitcher Hand
Batter Event Flag (set it to "T")
Event Type (number from 0 to 24, which tells if the PA resulted in generic out, strikeout,...,walk, intentional walk (separate), ...,error,...,home run, and so forth.

There are also fields 97 in all) that tell you things like runner on 1st,2nd, etc.; destination of batter, runner on 1st, etc.; SB, CS, assists, putouts, and the like.

The main problem when using these files with a database is that it is not easy to connect plays. Say that you want to look at whether runners who steal a base later score. If the lead-off batter gets to first then steals second, two more batters could reach base without his scoring and two more batters could make outs. Or, he could go to third on the throw to second and score on a wild pitch to the next batter. You could figure this out with SQL but it would be extremely time consuming.
Thanks for the info!

I actually found the stats search engine at FanGraphs, which suits my needs perfectly. I can download player splits in bulk without having to convert from play-by-play data and without a million clicks. I was familiar with the site but never came across that search engine before. I'll keep your method in mind if I come up with new data requirements tho.