Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2720: handle consecutive whitespaces #2721

Merged
merged 1 commit into from
May 4, 2022
Merged

GH-2720: handle consecutive whitespaces #2721

merged 1 commit into from
May 4, 2022

Conversation

mauryaland
Copy link
Contributor

Related to the issue #2720.

One quick question, I do not fully understand the purpose of lines 789-791 in data.py:

if token.start_position == 0 and len(self) > 0:
    token.start_pos = len(self.to_original_text()) + self[-1].whitespace_after
    token.end_pos = token.start_pos + len(token.text)

Why are you taking the length of the sentence (since self refers to the sentence here) to get start_pos parameter?

@alanakbik
Copy link
Collaborator

@mauryaland thanks a lot for adding this and sorry for reviewing so late!

Regarding your question: I guess it is a bit inefficient to take sentence length. The idea is that tokens get added one after another to the sentence so each time when they get added, the sentence provides its current length as the start position of the new token. But with your change one could probably also use the position of the last token plus its whitespace_after information to get the start position of the new token.

@alanakbik alanakbik merged commit 33b72e6 into flairNLP:master May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants