Skip to content
/ SMMTT Public

Social Media Machine Translation Toolkit

License

Notifications You must be signed in to change notification settings

wlin12/SMMTT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SMMTT

Social Media Machine Translation Toolkit - Everything you need to start building Machine Translation Systems on Social Media

This toolkit was proposed as a project in MTMarathon 2013.

Proposed and Maintained by:Wang Ling (http://www.cs.cmu.edu/~lingwang/)

Contributors: Carolin Haas, Chris Dyer, Adam Lopez

requirements: Moses, Giza++ and KenLM - You can follow the guide in http://www.statmt.org/moses/?n=Moses.Baseline, which will get these installed

Usage: scripts/runExperiment.sh source(ex: en) target(ex: zh) rootdir(where this package is) mosesdir(moses instalation) mosesexternaldir(where giza is, probably mosesdecoder/tools) model(where the model and results will be generated)

Structure:

data/parallel/ - parallel dataset directory: to add more data add files ending with ".en-cn" in the format " ||| " ( see existing data/parallel/microtopia.en-cn for an example)

scripts/runExperiment.sh - script that builds an mt system using existing data and evaluates on the microblog testset. Run without arguments for description.

scripts/tokenize/ - path with different tokenizers for different languages

The parallel data is obtained from http://www.cs.cmu.edu/~lingwang/microtopia/. So, if you use this toolkit please cite:

@inproceedings{wangling:acl2013, author = {Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel}, title = {Microblogs as Parallel Corpora}, booktitle = {Proceedings of the 51st Annual Meeting on Association for Computational Linguistics}, series = {ACL '13}, year = {2013}, location = {Sofia, Bulgaria}, numpages = {8}, publisher = {Association for Computational Linguistics} }

About

Social Media Machine Translation Toolkit

file:///Users/lingwang/Documents/www/microtopia/index.html#translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages