Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent tagging for single word sentences #4

Closed
desilinguist opened this issue Jul 2, 2015 · 1 comment
Closed

Inconsistent tagging for single word sentences #4

desilinguist opened this issue Jul 2, 2015 · 1 comment

Comments

@desilinguist
Copy link
Collaborator

For single word sentences, ZPar tagging is not always consistent. For example, given the following input file:

REBELLION

I am going away .
The rebellion is just another word for change and change is necessary to live .
REBELLION
REBELLION
The rebellion is just another word for change and change is necessary to live .
REBELLION
This is just another sentence .
REBELLION

I get the following tagger output:

REBELLION/NN

I/PRP am/VBP going/VBG away/RB ./.
The/DT rebellion/NN is/VBZ just/RB another/DT word/NN for/IN change/NN and/CC change/NN is/VBZ necessary/JJ to/TO live/VB ./.
REBELLION/NNP
REBELLION/NNP
The/DT rebellion/NN is/VBZ just/RB another/DT word/NN for/IN change/NN and/CC change/NN is/VBZ necessary/JJ to/TO live/VB ./.
REBELLION/NNP
This/DT is/VBZ just/RB another/DT sentence/NN ./.
REBELLION/IN

As you can see, the word REBELLION is tagged as NN, NNP and IN all in the same text. This is obviously inconsistent.

@desilinguist
Copy link
Collaborator Author

Looks like the issue is underflow. In tagger.cpp, we check whether there are 2 words between the current word and the end of the sentence but we never check whether the sentence actually has more than 2 words. For a single word sentence, i.e., where m_CacheSize is 1, m_CacheSize-2 will underflow and become a large positive number. And, therefore, the condition will be satisfied when it shouldn't be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant