Xml data table scraping

webscraping

#1

I am new to data scraping and having problem after doing lot of research and experiments.

I want to pull a table name “long build up” for “stock"futures” to google sheet from this link https://www.indiainfoline.com/markets/derivatives/futures-and-options Someone’s help will be great appreciated, Tanks in advance


#2

The first thing you need is the source code of that page to scrap the table from it. for that, you can use the beautiful soup library.
here is the code that will help to start.

from bs4 import BeautifulSoup
import http.client

conn = http.client.HTTPSConnection(“www.indiainfoline.com”)

headers = {
‘cache-control’: “no-cache”,
}

conn.request(“GET”, “/markets/derivatives/futures-and-options”, headers=headers)

res = conn.getresponse()
data = res.read()
data = BeautifulSoup(data,‘lxml’)

print(data)


#3

Hi,

use REST API to get the page source from the website.(Example :requests.get,requests.post etc)
and then apply xpath on the page source to get the particular data like you needed table.

Example code is here.
import requests
from time import sleep
from lxml import html
import pandas as pd
response = requests.get(‘https://www.indiainfoline.com/markets/derivatives/long-buildup/optidx/CE/07-Feb-2019’)
sleep(10)
tree = html.fromstring(response.content)
table = pd.read_html(html.tostring(tree.xpath(’//div[@class=“derivatives_long_buildup mt10”]//table’)[0]),encoding=‘utf-8’)[0]
print(table)