How to read zip file directly in Python?

ipython
python

#1

To read zipped csv file, I first unzip it then read it using pandas read_csv. Is there any library in python that can read zip file directly?

Pravin


#2

@pravin,

You can use “zipfile” module to read ZIP archive files.

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Analytics Vidhya/Desktop/test.zip') # having First.csv zipped file.
df = pd.read_csv(zf.open('First.csv'))

For more detail on it, you can follow this link…http://pymotw.com/2/zipfile/

Regards,
Sunil


#3

@Sunil

Is there any way I can directly read a csv file from zip file ?

Like in line 4, you mentioned the filename, I don’t want to mention the filename (considering the fact that there is only one file in the zipped file)


#4

Hi,

You can now directly read the csv file inside zip folder (given that there is only one csv file present in the folder) . Here is the command:

df=pd.read_csv("folder_name.zip")


#5

Here is your answer @pulkitpahwa

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Analytics Vidhya/Desktop/test.zip') 
df = pd.read_csv(zf.open(zipfile.ZipFile.namelist(zf)[0])) 

Provided there should be only one file in the zipfile without nested folders.


#6

Hi @AishwaryaSingh,
Yes you can.
If you want to read a zipped or a tar.gz file into pandas dataframe, the read_csv methods includes this particular implementation.
df = pd.read_csv('filename.tar.gz', compression='gzip', header=0, sep=',', quotechar='"')
compression : {‘gzip’, ‘bz2’, ‘infer’, None}, default ‘infer’
For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip or bz2 if filepath_or_buffer is a string ending in ‘.gz’ or ‘.bz2’, respectively, and no decompression otherwise. Set to None for no decompression.
read pandas docs