Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

NewbieNerd

macrumors 6502a
Original poster
Sep 22, 2005
512
0
Chicago, IL
I'm thinking about setting up some machine learning algorithms, particularly to use with nba and college basketball prediction. I did a couple of them last year for a class with the men's NCAA tournament and it worked quite well, so I'm interested in keeping it up.

Problem is, data collection was a pain because I did it all by hand, copying and pasting. Setting up a script to do it seems annoyingly complex as nba.com and espn.com boxscores have tons of crap in every webpage. Does anyone have any better ideas? Know any sites that do this data collection already? Thanks.
 

OutThere

macrumors 603
Dec 19, 2002
5,730
3
NYC
You could use regular expressions to parse the websites and it wouldn't be too difficult.
 

NewbieNerd

macrumors 6502a
Original poster
Sep 22, 2005
512
0
Chicago, IL
zimv20 said:
do those sites offer XML feeds?

I'm not quite sure what XML feeds are, but the sites (espn.com and nba.com, for instance) do offer RSS feeds, but these are just recaps. Some boxscores, when you look at the source, don't even appear in the html file itself, and copying the text from webpage to a text file doesn't seem to work either. I think my best bet might be just to use the play-by-plays, which offer even more data than the boxscores and are easier to get and parse.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.