Skip to content

geoff-nixon/charlotte

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Charlotte

Ruby's "econv" (1.9.3+) implements native character set conversion functionality, but it lacks the ability to detect encodings. This gem is a small, pure Ruby character set encoding detection library for quickly detecting and automatically converting to UTF-8. The main thrust is to be lightweight and fast, rather that pedantic and exhaustive. It covers common encodings (UTF-8/16/32, ISO-8859-1, MacRoman, etc.), and returns rare legacy encodings and binary files as "ASCII-8BIT" / "BINARY", possibly for further processing if needed. It was primarily written as a potential alternative to charlock_holmes (used in linguist), which leverages the ICU library via a C++ extention, and rchardet, which, while exhaustive, is quite slow.

Pull requests, contributions, optimization, other ideas, greatly welcomed.

Install

gem install charlotte

About

Pure Ruby character set encoding detection library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published