Skip to content

Commit 869b792

Browse files
authored
Add interscript command (#4)
* Add “interscript” command and fix maps
1 parent d985282 commit 869b792

13 files changed

+140
-25
lines changed

Gemfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
source 'https://rubygems.org'

README.adoc

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= Interoperable Transliteration Schemes and a Ruby implementation
1+
= Interscript: Interoperable Transliteration Schemes and a Ruby implementation
22

33
== Introducation
44

@@ -7,29 +7,34 @@ This repository contains a number of transliteration schemes from:
77
* BGN/PCGN
88
* ICAO
99
* ISO
10+
* UN (by UNGEGN)
1011

11-
The goal is to achieve quality comparison and easily swappable transliteration schemes.
12+
The goal is to achieve interoperable transliteration schemes allowing quality comparisons.
1213

13-
== Covered languages
1414

15-
Currently the schemes cover Cyrillic, Armenian, Greek, Arabic and Hebrew.
15+
== STATUS (work in progress!)
16+
17+
These transliteration systems currently work:
18+
19+
`bgnpcgn-rus-Cyrl-Latn`:: BGN/PCGN Romanization of Russian
20+
`iso-rus-Cyrl-Latn`:: ISO 9 Romanization of Russian
21+
`icao-rus-Cyrl-Latn`:: ICAO MRZ Romanization of Russian
22+
`bas-rus-Cyrl-Latn`:: Bulgaria Academy of Science Streamlined System for Russian
1623

1724

18-
== Initial work
25+
== Usage
1926

20-
The initial work is to:
2127

22-
1. Write a Ruby script that allows transliterating some text (under `samples/`)
23-
into the target writing system via the files in `maps/`.
28+
[source,sh]
29+
----
30+
interscript samples/rus-Cyrl.txt --system=bas-rus-Cyrl-Latn --output=rus-Latn.txt
31+
----
2432

25-
2. Initially we only want to compare Russian transcriptions. There are the following definition maps:
2633

27-
.. BGN/PCGN Romanization of Russian (`bgnpcgn-rus-Cyrl-Latn.yaml`)
28-
.. ISO 9 Romanization of Russian (`iso-rus-Cyrl-Latn.yaml`)
29-
.. ICAO MRZ Romanization of Russian (`icao-rus-Cyrl-Latn.yaml`)
30-
.. Bulgaria Academy of Science Streamlined System for Russian (`bas-rus-Cyrl-Latn.yaml`)
3134

32-
3. There should be a command to generate all 4 transliterations at once.
35+
== Covered languages
36+
37+
Currently the schemes cover Cyrillic, Armenian, Greek, Arabic and Hebrew.
3338

3439

3540
== Sources

Rakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
require "bundler/gem_tasks"
2+

bin/interscript

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/usr/bin/env ruby
2+
require 'rubygems'
3+
require_relative '../lib/interscript'
4+
5+
if ARGV.empty?
6+
puts "write source file, source format, and output file"
7+
else
8+
args = Hash[ ARGV.flat_map{|s| s.scan(/--?([^=\s]+)(?:=(\S+))?/) } ]
9+
input = ARGV[0]
10+
system_code = args["system"]
11+
output_file = args["output"]
12+
13+
raise "Please enter the system code with --system={system_code}" unless system_code
14+
15+
if output_file
16+
Interscript.instance.transliterate_file(system_code, input, output_file)
17+
else
18+
puts Interscript.instance.transliterate(system_code, IO.read(input))
19+
end
20+
end
21+
22+

interscript.gemspec

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
2+
# coding: utf-8
3+
lib = File.expand_path('../lib', __FILE__)
4+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5+
require 'interscript/version'
6+
require 'rake'
7+
8+
Gem::Specification.new do |spec|
9+
spec.name = "interscript"
10+
spec.version = Interscript::VERSION
11+
spec.required_rubygems_version = Gem::Requirement.new('>= 0') if spec.respond_to? :required_rubygems_version=
12+
spec.summary = %q{Interoperable script conversion systems}
13+
spec.description = %q{Interoperable script conversion systems}
14+
spec.authors = ['project_contibutors']
15+
spec.date = %q{2019-11-17}
16+
spec.homepage = ""
17+
spec.license = "MIT"
18+
spec.files = FileList['{bin,lib,test}/**/*', 'README.adoc'].to_a
19+
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
20+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
21+
spec.require_paths = ["lib"]
22+
spec.bindir = 'bin'
23+
spec.add_development_dependency "bundler", "~> 1.7"
24+
spec.add_development_dependency "rake", "~> 10.0"
25+
spec.add_development_dependency "rspec"
26+
end

lib/interscript.rb

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
require 'yaml'
2+
require 'singleton'
3+
4+
class Interscript
5+
include Singleton
6+
7+
SYSTEM_DEFINITIONS_PATH = File.expand_path('../../maps', __FILE__)
8+
9+
def initialize
10+
@systems = {}
11+
end
12+
13+
def transliterate_file(system_code, input_file, output_file)
14+
input = File.read(input_file)
15+
output = transliterate(system_code, input)
16+
17+
File.open(output_file, "w") do |f|
18+
f.puts(output)
19+
end
20+
puts "Output written to: #{output_file}"
21+
end
22+
23+
def load_system_definition(system_code)
24+
@systems[system_code] ||= YAML.load_file(File.join(SYSTEM_DEFINITIONS_PATH, "#{system_code}.yaml"))
25+
end
26+
27+
def get_system(system_code)
28+
@systems[system_code]
29+
end
30+
31+
def system_char_map(system_code)
32+
get_system(system_code)["map"]["characters"]
33+
end
34+
35+
def system_rules(system_code)
36+
get_system(system_code)["map"]["rules"]
37+
end
38+
39+
def transliterate(system_code, string)
40+
load_system_definition(system_code)
41+
42+
# TODO: also need to support regular expressions via system_rules(system_code), before system_char_map
43+
44+
character_map = system_char_map(system_code)
45+
46+
string.split('').map do |char|
47+
converted_char = character_map[char] ? character_map[char] : char
48+
string[char] = converted_char
49+
end.join('')
50+
end
51+
52+
end
53+

lib/interscript/version.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
module Interscript
2+
VERSION = "0.9"
3+
end

maps/bas-rus-Cyrl-Latn-bss.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ tests:
5757
5858
map:
5959
characters:
60-
"\u0027": "" # '
60+
"\u0027": "" # '
6161
"\u0410": "A" # А
6262
"\u0411": "B" # Б
6363
"\u0412": "V" # В
@@ -146,3 +146,4 @@ map:
146146

147147
# ъ -> (none)
148148
"\u044a": ""
149+

maps/bgnpcgn-rus-Cyrl-Latn-1947.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -154,17 +154,17 @@ tests:
154154
map:
155155
rules:
156156
- pattern: "/([ЄФІЦАаЕеИиЙйОоУуЫыЮюЯяії])\u0415/"
157-
result: "\1YE"
157+
result: "\\1YE"
158158
- pattern: "/^\u0415/"
159159
result: "YE"
160160
- pattern: "/([йьъ])\u0415/"
161-
result: "\1YE"
161+
result: "\\1YE"
162162
- pattern: "/([ЄФІЦАаЕеИиЙйОоУуЫыЮюЯяії])\u0435/"
163-
result: "\1ye"
163+
result: "\\1ye"
164164
- pattern: "/^\u0435/"
165165
result: "ye"
166166
- pattern: "/([йьъ])\u0435/"
167-
result: "\1ye"
167+
result: "\\1ye"
168168

169169
characters:
170170
"\u0410": "A"

maps/icao-rus-Cyrl-Latn-9303.yaml

100644100755
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ map:
1111
"\u0410": "A" # А
1212
"\u0411": "B" # Б
1313
"\u0414": "D" # Д
14-
"\u0401": "E" # Ё
15-
"\u0415": "E" # Е
14+
"\u0401": "E" # Ё
15+
"\u0415": "E" # Е
1616
"\u042D": "E" # Э
1717
"\u0424": "F" # Ф
1818
"\u0413": "G" # Г
@@ -97,3 +97,4 @@ map:
9797
"\u0454": "ie" # є
9898
"\u0457": "i" # ї
9999
"\u0453": "g" # ѓ
100+

0 commit comments

Comments
 (0)