Skip to content

thammegowda/005-nmt-imbalance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NMT Imbalance

This repository contains tools to reproduce the experiments reported in "Finding the optimal vocabulary for Neural Machine Translation" https://aclanthology.org/2020.findings-emnlp.352

Tools :

Analysis

Available under analysis dir

Get the datasets

./get_data.sh

Run experiments

Assumption: you have installed rtg via pip and rtg-pipe commnad or python -m rtg.pipeline is available.

Each directory that has conf.yml is an RTG experiment directory; you can rtg on it. The only hiccup you need to resolve is make sure the paths in the config files are valid. Since I had used symlinks with relative paths, those relative paths may break if you run from a different directory.

Please run each experiment from runs-* directory

You have two options.

  1. You can simply use rtg-pipe to one of those directories that have conf.yml file in them. Example rtg-pipe 055-ende-08k08k-r1

  2. You may use rtg-pipeline.sh if you are submitting a job to slurm, e.g. sbatch rtg-pipeline.sh -d 055-ende-08k08k-r1. Please adapt #SBATCH to your cluster.

Recommendation:

  1. cd to one of the run dirs; eg cd runs-deen-all
  2. for i in 0*; do sbatch -J $i rtg-pipeline.sh -d $i; done

Citation:

https://aclanthology.org/2020.findings-emnlp.352

@inproceedings{gowda-may-2020-finding,
    title = "Finding the Optimal Vocabulary Size for Neural Machine Translation",
    author = "Gowda, Thamme  and
      May, Jonathan",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.352",
    doi = "10.18653/v1/2020.findings-emnlp.352",
    pages = "3955--3964",
}