Skip to content
/ WSD Public

Word Sense Disambiguation, academic project on NLP for completion of B.Tech from College of Engineering Trivandum, 2013

Notifications You must be signed in to change notification settings

amarif1/WSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Sense Disambiguation

Word sense disambiguation (WSD) is the task of selecting the appropriate senses of a word in a given context. It is essence of communication in natural language. It is motivated by its use in many crucial applications such as Information retrieval, Information extraction, Machine Translation, Part of Speech tagging, etc. Various issues like scalability, ambiguity, diversity (of languages) and evaluation pose challenges to WSD solutions.

Véronis, J. (2004) had proposed an innovative unsupervised algorithm for word sense disambiguation based on small-world graphs called HyperLex. We extend this work by optimizing the free parameters and mapping the induced senses to a standard lexicon (WordNet). Also we adapt this algorithm, which was originally designed to tackle WSD problems in information retrieval systems, as a human language disambiguation aid.

A large corpus for the target word was extracted from Web and it was clustered following Hyperlex algorithm. The resulting graph was further analyzed. Even from the wide junk of data, the small world property was found to hold. But the performance was poor with regard to commercial applications.Hence we had to modify various parameters and apply further consolidation techniques. The resultant system provided around 87% accuracy on real world data. Also mapping the clusters into senses found in standard lexicon was done by hand.

The resultant graph for each word contributes to a wide knowledge base that can be used to study and to analyze the real world occurrences and patterns in the language.

Architecture

alt text

Python dependencies

  • NLTK
  • NetworkX
  • Numpy
  • BeautifulSoup4
  • Bleach
  • PyEnchant

Contributors:

About

Word Sense Disambiguation, academic project on NLP for completion of B.Tech from College of Engineering Trivandum, 2013

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published