Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unicode Escape Sequence to replace encoded characters #2814

Merged
merged 1 commit into from
Aug 26, 2023

Conversation

drasticactions
Copy link
Contributor

@drasticactions drasticactions commented Aug 26, 2023

Using special characters within source files can break compiling on some computers with different regions and language settings. I have a ja-JP Windows 11 setup, and trying to compile the current master branch fails on find_bpe_rank due to the special characters recently introduced. Note that using a compiled build is fine; only compiling itself fails.

Using Unicode escape sequences should allow the code to be compiled on all setups without changing your computer's settings or switching regions. Trying out my changes and it seems like everything processes as it should, but hopefully others with more C++ experience know if I screwed something else up here.

e. Searching through the other repos, similar techniques have been done before, so I'm feeling more confident now in this.

The use of special characters within source files can break compiling on some computers with different region and language settings. Using Unicode escape sequences should allow for the code to be compiled on all setups without needing to change your computers settings or switch regions.
llama.cpp Show resolved Hide resolved
@ggerganov ggerganov merged commit c7d92e6 into ggerganov:master Aug 26, 2023
25 checks passed
mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Aug 26, 2023
* master: (773 commits)
  server : add `/detokenize` endpoint (ggerganov#2802)
  convert.py : advanced option (ggerganov#2753)
  llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814)
  flake.nix : add rocm support and cleanup (ggerganov#2808)
  llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817)
  main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528)
  llama : use std::abs in llama_sample_tail_free (ggerganov#2800)
  k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811)
  Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807)
  Fix HellaSwag (ggerganov#2805)
  flake : build llama.cpp on Intel with nix (ggerganov#2795)
  Handle null rope scaling value (ggerganov#2793)
  Fix spm whitespaces (ggerganov#2806)
  examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804)
  llama : fix struct decl (ggerganov#2790)
  Faster perplexity computation (ggerganov#2786)
  llama : add llama_beam_search() (ggerganov#2267)
  convert.py : Get rope scale from HuggingFace models (ggerganov#2772)
  llama-bench : add model sizes (ggerganov#2771)
  convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773)
  ...
@drasticactions drasticactions deleted the unicode-escape-sequence branch August 27, 2023 00:58
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
…erganov#2814)

The use of special characters within source files can break compiling on some computers with different region and language settings. Using Unicode escape sequences should allow for the code to be compiled on all setups without needing to change your computers settings or switch regions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants