Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set console to UTF-8 encoding on Windows #3951

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kelleyma49
Copy link
Contributor

@kelleyma49 kelleyma49 mentioned this pull request Aug 4, 2024
10 tasks
@junegunn
Copy link
Owner

junegunn commented Aug 4, 2024

Thank you very much for the patch.

I've tested the patch on cmd.exe on my Windows VM, but when I start fzf with --height option and paste 한글 to the prompt, fzf just terminates. It works fine without --height. Any idea?

2024-08-04.1.52.57.mov

@kelleyma49
Copy link
Contributor Author

I did all of my testing in Windows Terminal, so let me test some more. It could be because of this:

API Differences
The major known difference between legacy and current is the implementation of UTF-8. The legacy host has extremely rudimentary and often incorrect support of UTF-8 with code page 65001. The current console host contains incremental improvements release-over-release of Windows 10 to improve this support. Applications that are attempting to rely on predicting "known incorrect" interpretations of UTF-8 from the legacy console will find themselves receiving different answers as support is improved.

Other differences experienced with APIs should be reported to the microsoft/terminal GitHub repository or via the Feedback Hub for triage and possible remediation.

https://learn.microsoft.com/en-us/windows/console/legacymode#api-differences

@junegunn
Copy link
Owner

I've finally had a chance to test this on Windows Terminal. It works, however, on Windows Terminal, I don't seem to have any problem entering non-ASCII characters even without the patch (binary built from master branch).

@junegunn
Copy link
Owner

junegunn commented Aug 25, 2024

Sorry, I was wrong. This does fix the problem on Windows Terminal. Before the patch, I had to manually run chcp 65001 to enable non-ASCII input.

But one thing I noticed is that it permanently changes the chcp of the current session to 65001. Do you think it's okay to leave it that way? Or should we just tell the users to run chcp 65001 if they want to input non-ASCII query?

@DanSM-5
Copy link
Contributor

DanSM-5 commented Aug 25, 2024

I would advice to avoid changing the code page permanently as it may lead to side effects with other tools that expect the default code page.

I think a simple solution is to query the initial code page with GetConsoleCP and restore it at the end.

Or should we just tell the users to run chcp 65001 if they want to input non-ASCII query instead of doing it for them?

It is a solution but users need to be aware of the meaning of changing chcp. Because of some issues I've had in the past, I think it is usually better to always restore the code page to whatever it was before. I even created a powershell script which only purpose is to run things with a code page change and revert it back once it is done.

# With-UTF8.ps1
# Usage
# ```powershell
# With-UTF8 { command }
# ```

[CmdletBinding()]
Param (
  [scriptblock] $block
)

try {
  # Ref: https://stackoverflow.com/questions/49476326/displaying-unicode-in-powershell
  # Save the current settings and temporarily switch to UTF-8.
  $oldOutputEncoding = $OutputEncoding;
  $oldConsoleEncoding = [Console]::OutputEncoding
  $OutputEncoding = [Console]::OutputEncoding = New-Object System.Text.Utf8Encoding

  # Execute block with utf-8 encoding
  return & $block
} finally {
  # Restore the previous settings.
  $OutputEncoding = $oldOutputEncoding;
  [Console]::OutputEncoding = $oldConsoleEncoding
}

I use this script very frequently on some custom functions that use fzf to handle the encoding of file names correctly (for display rather than for input) and I've never had an issue with fzf itself, so I'd say that changing the code page internally temporarily should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't input non ascii characters
3 participants