How can we do data mining and analysis on data from sites other than Facebook and Twitter?




Sites like Facebook and Twitter provide their API’s for getting data for the purpose of data mining. I wanted to ask how can we get the data from sites other than these social networking sites, for example maybe IMDB (don’t know if they provide some API too) or something.




There are 2 ways for extracting data from a website:

Option 1: Through APIs - a lot of sites apart from Fb & Twitter provide API. These include StackOverflow, Travel comparison sites, Job sites like Indeed etc. This is usually preferred manner as the data comes in structured manner and your code and application is likely to be robust.

Option 2: Web scrapping the HTML code - this way you can extract information through net, however there is a good chance of this breaking in any serious work because the page structure and information presentation can change over time.

You can also look at