Skip to content

gojira69/arithmetic-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Arithmetic Transformer

A compact, high-performance neural architecture for multi-digit addition using a hybrid Transformer-LSTM model. This repository includes model code, training pipeline, evaluation metrics, and insights from an extensive ablation and error analysis. It also investigates how deep learning models—specifically a Transformer-LSTM hybrid—can learn to perform symbolic arithmetic (addition) with high accuracy. It includes:

  • Curriculum-based training on digit lengths 1 to 7
  • Evaluation of error types and model generalization
  • Ablation studies on architecture and hidden dimensions

Model Architecture

  • Base: Transformer encoder + LSTM decoder
  • Trainable Parameters: 102,991
  • Curriculum Learning: Gradual increase in digit length (1→7)
  • Evaluation Metrics: Exact match accuracy, character accuracy, perplexity

Performance by Digit Length

Digit Length Epochs Required Final Accuracy Initial Accuracy
1-digit 1 100% 100%
2-digit 1 98.83% 98.83%
3-digit 2 97.86% 89.25%
4-digit 3 96.84% 85.36%
5-digit 10 93.57% 8.32%
6-digit 1 82.52% 82.52%
7-digit 2 97.23% 1.03%

Error Analysis

Error Types

  • Digit Substitution Errors (38%)
  • Carry Propagation Failures (45%)
  • Length Miscalculations (12%)
  • Positional Errors (5%)

Observations

  • 92% of errors involve carry operations
  • Errors increase exponentially with digit length
  • Mid-positions (3rd–5th digits) are most error-prone
  • Digits with repeated 8s and 9s cause more confusion

Training Dynamics

  • Fast convergence on simpler sequences (1–4 digits)
  • 5-digit tasks required 10 epochs for 93.57% accuracy
  • Perplexity spikes when increasing digit length
  • 6-digit accuracy anomalously low despite training

Ablation Studies

1. Hybrid vs Pure Transformer

Metric Transformer-LSTM Pure Transformer
5-digit Accuracy 93.57% 89.42%
7-digit Accuracy 97.23% 94.81%
Epochs to Converge 17 23
Carry Error Rate 45% 58%

Takeaway: LSTM enhances sequence carry handling and convergence efficiency.


2. Hidden Dimension Size

Metric Hidden Dim 64 Hidden Dim 128
5-digit Accuracy 99.96% 98.29%
7-digit Accuracy 99.95% 82.50%
Epochs to Converge 4 8
Perplexity 1.0005 1.0088

Takeaway: Surprisingly, smaller hidden dimension (64) outperforms larger ones (128).


Hyperparameter Search Results

d_model nhead num_encoder_layers num_decoder_layers dim_feedforward learning_rate best_epoch val_loss val_char_accuracy val_seq_accuracy training_time
64 2 2 2 256 0.0001 30 1.4035800695419312 0.49499374217772213 0.0 31.422882795333862
64 2 2 2 256 0.0005 30 1.0751647353172302 0.6151439299123905 0.0 30.989251613616943
64 2 2 2 256 0.001 30 1.0031758546829224 0.6376720901126408 0.0 29.890056133270264
64 2 2 2 512 0.0001 30 1.360324740409851 0.5071964956195244 0.0 29.192161083221436
64 2 2 2 512 0.0005 30 1.0545085668563843 0.6170212765957447 0.0 29.394908905029297
64 2 2 2 512 0.001 30 0.914421945810318 0.6602002503128911 0.0 30.800782203674316
64 2 2 2 1024 0.0001 30 1.3308556079864502 0.5237797246558198 0.0 28.76651930809021
64 2 2 2 1024 0.0005 30 1.0258559584617615 0.623279098873592 0.0 25.842985153198242
64 2 2 2 1024 0.001 30 0.833139955997467 0.6877346683354193 0.001 28.320575952529907
64 2 2 3 256 0.0001 29 1.3940486907958984 0.4903003754693367 0.001 26.37175154685974
64 2 2 3 256 0.0005 30 1.0782753229141235 0.6038798498122653 0.0 24.93513512611389
64 2 2 3 256 0.001 30 0.9739904403686523 0.6458072590738423 0.001 24.815475463867188
64 2 2 3 512 0.0001 30 1.333204209804535 0.5068836045056321 0.0 24.782799005508423
64 2 2 3 512 0.0005 30 1.0625354051589966 0.6070087609511889 0.0 24.964926719665527
64 2 2 3 512 0.001 30 0.9153944849967957 0.6645807259073843 0.0 25.089921236038208
64 2 2 3 1024 0.0001 30 1.2750869989395142 0.5525657071339174 0.0 25.409332513809204
64 2 2 3 1024 0.0005 30 0.9939264357089996 0.6323529411764706 0.001 24.969414234161377
64 2 2 3 1024 0.001 30 0.40968507528305054 0.8473091364205256 0.014 24.98571491241455
64 2 2 4 256 0.0001 30 1.3720954060554504 0.5018773466833542 0.001 28.31575298309326
64 2 2 4 256 0.0005 30 1.080482840538025 0.5973091364205256 0.0 29.52979850769043
64 2 2 4 256 0.001 30 0.9904355704784393 0.6414267834793492 0.0 27.33615493774414
64 2 2 4 512 0.0001 30 1.3286152482032776 0.5140801001251565 0.0 35.518285512924194

🧠 Human vs Model Arithmetic

Criteria Human Transformer Model
Generalizable Rules ❌ (Length-specific)
Error Recovery
Carry Propagation ❌ (Unstable)
Systematic Thinking ❌ (Pattern-based)

The model shows near-perfect accuracy but lacks procedural arithmetic reasoning like humans.


Key Takeaways

  • Transformer-LSTM hybrid > Pure Transformer
  • Smaller hidden dim (64) performs better than larger (128)
  • Carry handling is the model’s main weakness
  • Curriculum learning is essential for training success
  • Performance is sensitive to digit length transitions

About

Multi Digit Addition using Transformer-based Neural Networkl

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published