Found error Message while tokenise a tweet text in python?

text_mining
python

#1

Hi,

I have found an error message while tokensie a tweet text in python. Below is the code to tokenise.

from nltk.tokenize import word_tokenize
tweet = 'Hi @Amit, please go through the url test@demo.com, it will solve your issues'
print(word_tokenize(tweet))

Error

LookupError                               Traceback (most recent call last) <ipython-input-57-6f1793193e08> in <module>()       1 from nltk.tokenize import word_tokenize       2 tweet='sdfghjk:fgcvhbjnk' ----> 3 print(word_tokenize(tweet)) D:\Users\PRAVIN\Anaconda\lib\site-packages\nltk\tokenize\__init__.py in word_tokenize(text)      91     along with :class:`.PunktSentenceTokenizer`).      92     """ ---> 93     return [token for sent in sent_tokenize(text)      94             for token in _treebank_word_tokenize(sent)]      95  D:\Users\PRAVIN\Anaconda\lib\site-packages\nltk\tokenize\__init__.py in sent_tokenize(text)      79     (currently :class:`.PunktSentenceTokenizer`).      80     """ ---> 81     tokenizer = load('tokenizers/punkt/english.pickle')      82     return tokenizer.tokenize(text)      83  D:\Users\PRAVIN\Anaconda\lib\site-packages\nltk\data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding)     772      773     # Load the resource. --> 774     opened_resource = _open(resource_url)     775      776     if format == 'raw': D:\Users\PRAVIN\Anaconda\lib\site-packages\nltk\data.py in _open(resource_url)     886      887     if protocol is None or protocol.lower() == 'nltk': --> 888         return find(path_, path + ['']).open()     889     elif protocol.lower() == 'file':     890         # urllib might not use mode='rb', so handle this one ourselves: D:\Users\PRAVIN\Anaconda\lib\site-packages\nltk\data.py in find(resource_name, paths)     616     sep = '*'*70     617     resource_not_found = '\n%s\n%s\n%s' % (sep, msg, sep) --> 618     raise LookupError(resource_not_found)     619      620 def retrieve(resource_url, filename=None, verbose=True): LookupError:  **********************************************************************   Resource u'tokenizers/punkt/english.pickle' not found.  Please   use the NLTK Downloader to obtain the resource:  >>>   nltk.download()   Searched in:     - 'C:\\Users\\PRAVIN/nltk_data'     - 'C:\\nltk_data'     - 'D:\\nltk_data'     - 'E:\\nltk_data'     - 'D:\\Users\\PRAVIN\\Anaconda\\nltk_data'     - 'D:\\Users\\PRAVIN\\Anaconda\\lib\\nltk_data'     - 'C:\\Users\\PRAVIN\\AppData\\Roaming\\nltk_data'     - u'' 

As per error message, I have use the commands but it is not working.

import nltk
nltk.download()

Please help to resolve this.

Thanks,
Pravin


#2

In case you have not noticed - nltk.download() opens a new popup. Just choose all and say download. It should download required corpus and then tokenize should work.