Help required on Web Scraping with Python

python
webscraping

#1

Hello Team,

I am following Web Scraping with Python I executed your suggested code but I am not getting result. please help me.

I couldn’t understand what is exactly mean of if len(cells)==6 in this code.

code
import urllib2
wiki = “https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India
page = urllib2.urlopen(wiki)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page)
soup.title
soup.title.string
soup.a
soup.find_all(“a”)
all_links = soup.find_all(“a”)
all_tables=soup.find_all(‘table’)
right_table=soup.find(‘table’, {“class” : ‘wikitable sortable plainrowheaders’})
right_table

#Generate lists
A=
B=
C=
D=
E=
F=
G=
for row in right_table.find_all(‘tr’):
cells=row.find_all(‘td’)
states=row.find_all(‘th’) #To store second column data
if len(cells)==6: #Only extract table body not heading
A.append(cells[0].find(text=True))
B.append(states[0].find(text=True))
C.append(cells[1].find(text=True))
D.append(cells[2].find(text=True))
E.append(cells[3].find(text=True))
F.append(cells[4].find(text=True))
G.append(cells[5].find(text=True))

#import pandas to convert list to data frame
import pandas as pd
df=pd.DataFrame(A,columns=[‘Number’])
df[‘State/UT’]=B
df[‘Admin_Capital’]=C
df[‘Legislative_Capital’]=D
df[‘Judiciary_Capital’]=E
df[‘Year_Capital’]=F
df[‘Former_Capital’]=G
df

Output:

Empty DataFrame
Columns: [Number, State/UT, Admin_Capital, Legislative_Capital, Judiciary_Capital, Year_Capital, Former_Capital]
Index: