How to import Excel file in Python excluding first data row?

python

#1

Hi,

I am working on a project and one of the data file of the project has below structure. First Row is representing heading, second row is for unit and after that data values.

Now, I do not want to import row of units because it is creating a mismatch of data types. Please help me to import this data file in python excluding the unit row (first row of data values).

Thanks,
Imran


#2

First import the file :-

file_location = "C:/Users/DELL/Desktop/Book.xls"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)  

Then try using the code below:

for s in workbook.sheets():
	**for row in range(1,s.nrows):**
		col_names = s.row(0)
		col_value = []
		for name,col in zip(col_names,range(s.ncols)):
			value = (s.cell(row,col).value)
			try : value = str(int(value))
			except : pass
			col_value.append((name.value,value))
		values.append(col_value)
	print values

May be some changes have to be made according to the row you want to exclude.
For example if you want to exclude the second row,
for row in range(2,s.nrows) has to be used.
I am not entirely sure.
Can you please check and let me know if this helps!!
You also have to install xlrd through cmd first to use it.


#3

@Imran

This can be achieved in multiple ways like:

  • First one, Delete the first data record in CSV file and after that import it (easiest one) :smile:
  • Other one, Import the CSV file, drop the first data row and after that convert the data type of each columns.

Import CSV file

import pandas as pd
df = pd.read_csv("C:/Users/Analytics Vidhya/Desktop/test.csv") 
df.head(3)

Drop first data row

> df=df.drop(df.index[[0]])

Change data type

> df1= df.convert_objects(convert_numeric=True)

Above, “convert_objects” is used to change the object columns in a DataFrame to a more suitable datatype. Main advantage of this method is that it can be applied to whole DataFrame in one go.

Hope this helps!

Regards,
Sunil


#4

You can use the this one
df = pd.read_csv(‘filename.csv’,header=1)