Skip to content

Implementation of RAKE algorithm in Chinese text

License

Notifications You must be signed in to change notification settings

astridesa/rakeForChineseText

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAKE

Implementation of RAKE algorithm for Chinese text

  • The implementation is based on python3.6, and use jieba package for Chinese word segmentation. Please make sure you have installed the jieba package.
  • For more details about RAKE algorithm, please refer to the original paper proposed by S. Rose, D. Engel, N. Cramer, W. Cowley.
  • The effect of Chinese word segmentation has a great influence on the result of key phrases extraction, so if you want better results, it's better to use an advanced Chinese word segementation algorithm, not just use the jieba package.
  • Stop words will also greatly affect the effect of key phrases extraction. In order to get satisfactory results, it's recommended to use a specific stop word list for specific domain, rather than using a general stop word list.
  • Please note that the Chinese text should be utf-8 coded. The key phrases extraction results of the sample Chinese txt are as follows, for reference only.

If there is any doubts, please feel free to contact me.

Zhe wuzhe94@gmail.com

About

Implementation of RAKE algorithm in Chinese text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%