Nltk V3.2: Unable To Nltk.pos_tag()
Solution 1:
EDITED
This issue has been resolved from NLTK v3.2.1. Upgrading your NLTK version would resolve the issue, e.g. pip install -U nltk
.
I faced the same issue and the error encountered was as follows;
Traceback (most recent call last):
File"<stdin>", line 1, in<module>File"C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\__init__.py", line 110, in pos_tag
tagger =PerceptronTagger()
File"C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
File"C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File"C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File"C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File"C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File"C:\Python27\lib\urllib2.py", line 391, inopen
response =self._open(req, data)
File"C:\Python27\lib\urllib2.py", line 414, in _open
'unknown_open', req)
File"C:\Python27\lib\urllib2.py", line 369, in _call_chain
result =func(*args)
File"C:\Python27\lib\urllib2.py", line 1206, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>
The URLError that you mentioned was due to a bug in the perceptron.py file within the NLTK library for Windows. In my machine, the file is at this location
C:\Python27\Lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\perceptron.py
(Basically look at an equivalent location within yours wherever you have the Python27 folder)
The bug was basically in the code to find the corresponding location for the averaged_perceptron_tagger within your machine. One can have a look at the line 801 and 924 mentioned in the data.py file regarding this.
I think the NLTK developer community recently fixed this bug in the code. Have a look at this commit made to their code a few days back.
The snippet where the change was made is as follows;
self.tagdict = {}
self.classes = set()
ifload:
AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
self.load(AP_MODEL_LOC)
# Initially it was:AP_MODEL_LOC = str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
def tag(self, tokens):
Updating the file to the most recent commit worked for me and was able to use the nltk.pos_tag command. I believe this would resolve your problem as well (assuming you have everything else set up).
Solution 2:
EDITED
This issue has been resolved from NLTK v3.2.1. Please upgrade your NLTK!
First read @MananVyas answer for the why:
https://stackoverflow.com/a/35902494/610569
Here's the how, without downgrading to NLTK v3.1, using NLTK 3.2, you can use this "hack":
>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle">>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
Solution 3:
I faced the same issue a while back. Answer :
nltk.download('averaged_perceptron_tagger')
Post a Comment for "Nltk V3.2: Unable To Nltk.pos_tag()"