I’m new to Data science and after i read the great article of ARAVIND PAI.
And I wonder how i can execute TXT file and not CSV file , as a dataset.
Thank you for the help.
You can use txt files too with pandas
data = pd.read_csv('data.txt')
Give the appropriate
sep parameter to indicate how your columns are separated.
Or if you have a list of individual text files, say
data/ file1.txt file2.txt file3.txt file4.txt .....
You can read them as :
text =  for file in os.listdir("data"): with open(file,"r") as f: text.append(f.read())
Then you can create a pandas dataframe from this
data = pd.DataFrame(text, columns=['text'])
So if after I make a framework of Panda with TXT files ,
the code should run with no problem? or there minor changes?
When I’m trying to read the files but I’m getting this error :
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position 3131: invalid start byte
Another problem in the line of data.drop_duplicates Im getting KeryError: ‘Text’
Without seeing your actual code, it would be difficult to help you.
Maybe this could help you out -
KeryError: ‘Text’ , maybe the column ‘Text’ doesn’t exist in your dataset, or you are misspelling it.