How to scrape data from 'dd' and 'dl' tag using BeautifulSoup?

python

#1

Hi friends,

I am trying to scrape company details from following website

https://opencorporates.com/companies/gb/09636367

I am trying to scrape data of all variables like company number, company name etc.

I tried in beautifulSoup but when I get Soup its not have any variable and value.

How can I scrape those information can anyone please advice me

Thank you in advance


#2

Hi @premsheth,

You can follow the below article which explains scraping in detail:

It explains how BeautifulSoup can be used for scraping along with its implementation in python.


#3

@PulkitS Thanks for Reply but I already gone through it but my problem is not resolving


#4

Hi @premsheth,

You can find dt attributes or any DOM elements by using soup.find_all('dt') function. Anyways, I have written the whole code for your use case. Please find that below and let us know if you understood how it works. Thanks!

from bs4 import BeautifulSoup
import requests

url = "https://opencorporates.com/companies/gb/09636367"
page = requests.get(url)

soup = BeautifulSoup(page.text, "html.parser")

info = soup.find_all("dl", {'class':'attributes dl-horizontal'})

comp_info = pd.DataFrame()
cleaned_id_text = []
for i in info[0].find_all('dt'):
    cleaned_id_text.append(i.text)
cleaned_id__attrb_text = []
for i in info[0].find_all('dd'):
    cleaned_id__attrb_text.append(i.text)

comp_info['Id'] = cleaned_id_text
comp_info['Attribute'] = cleaned_id__attrb_text
comp_info


#5

@Shaz13 Thank you so much. I tried but I didn’t understand it will come in FOR loops. So now I understand.
I guess you have good command of Scraping work so can you please help me in following question?

Another question is:
Do you know any website from which we can scrape Indian companies details?
I found http://www.mca.gov.in/mcafoportal/viewCompanyMasterData.do this but not able to understand how to scrape from this website.

Thank you so much again


#6

Hi @premsheth,
The for loops are used to target the children of the div elements for required information. In this case the children where dd and dt. Then I extracted text from those elements and appended it to a list so that I can fill the column of empty pandas dataframe.

Before jumping directly into scrapping I would recommend you to understand how HTML and DOM structure works. That is key to understanding web scrapping.

I am afraid that the other question is out of scope for this forum. Although I can link you open data website of Indian Government


#7

@Shaz13 Thanks for reply. I gone through basic of web scraping. But your answer was very good.

Link you provided its give the different states of Datasets. I want to one link form where I can get any indian company information.