How to read a bunch of .docx files in Python?



Hello everyone,
I am working on parsing a bunch of .docx files in python. I have saved all the .docx files in a folder named .docx_files. I want to make a list docx_list whose each element is one docx file saved in my .docx_files folder. To do this, I am using python-docx library. I created a list which contains paths of all the .docx files using os.walk() method. The list looks as given below
[‘C:/Users/HP/Desktop/.docx files\AbhijeetPawar[3_7] (1).docx’,
‘C:/Users/HP/Desktop/.docx files\AbhijeetPolkamwar[0_0].docx’,
‘C:/Users/HP/Desktop/.docx files\AbhijeetRikame[6_0].docx’]
I passed this list to Document() method to get the required list. The code looks as below

for path in document_list:
resume_list = Document(path)

Now, when I printed the list by using the following code

for para in resume_list.paragraphs:

I can see only the document corresponding to the last path in my list. Can anyone help me find out what is going wrong? Also, suggest how to fix it.


import subprocess
from os import listdir
from os.path import isfile, join
from docx import Document

def file_operation():
mypath = ‘C:/Users/HP/Desktop/’ #Path to folder/Directory
#mypath = '/home/catana/Downloads/doc/'
list_files = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for k in range(len(list_files)):
# list_files[k] is counting path of file e.g. ‘C:/Users/HP/Desktop/.docx files\AbhijeetPawar3_7.docx’,
document = Document(list_files[k])
print document.paragraphs


Thanks for the answer!
It would be more helpful if you would add comments about what each code block is meant to perform.