Web scraping in python



I have to scrap a website “http://mciindia.org/InformationDesk/CollegesCoursesSearch.aspxclick here to go to the webpage

On selecting all option in the selection box i am getting a list of all the colleges, I am using Beautiful Soup in python to scrap the data.

I need to scrap all the data shown here across several pages.

I am not able to get the data once i pass the url in the beautiful soup , since the webpage is in aspx format.

I am not familiar to handle this situation.

Kindly help me regarding this!

Thanks in advance.



It would be great if you will share your approach (code) to solve this challenge. It would be helpful to solve the issue also.

To read data from URL, I would suggest you to use urllib2, look at the below syntax:

import urllib2
link = "https://…"
page = urllib2.urlopen(link)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page)

Hope this helps!




Regret for the late reply!

I have been able to crack the problem!

I have used urllib2 and beautifulsoup library for the problem!

I have another issue with this link:
click here for the webpage

In this webpage there is a form which uses ajax call on every interaction on the form!

I need to scrap the data for all the products in this form. I have been able to scrap it using python and selenium which is able to take care of the ajax call. But this approach is very slow, it takes around one and half hour to crawl on all the products.

I did some research on making a request to the webpage using library ‘Requests’ which would respond in either json or xml format.
But i am unable to implement in this problem. The problem is in creating the request,like the parameters which i need to include after inspecting the network elements of the webpage!

My basic point of confusion is how do we send the request to the server having all the parameters of the request.

Once the request is done properly i hope using BeautifulSoup we can get the data required.

It would be a great help if you can guide me in learning this stuff. Kindly help me!

Once again thanks for your reply!

Happy Holidays and Happy New Year!



@neel - you can also try import.io

They have some good automated solutions.



Thanks @kunal

I have tried to scrap using import.io but the mci website is calling a javascript on the click of a button which import.io is not able to handle.

Hard Luck in this case!

I want to learn about handling ajax call and javascript during scraping. I have went through the Analytics Vidya article on web scraping using python and selenium.

It would be a great learning to get some article on this from analytics vidya!