PowerShell Core now commendably defaults to UTF-8 encoding, including when sending strings to external programs, as reflected in $OutputEncoding's default value.
However, because the console-window shortcut file / taskbar entry still defaults to the OEM code page implied by the legacy system locale (e.g. 437 on US-English systems), it misinterprets strings from external programs; e.g., with Node.js installed:
PSCoreOnWin> $captured = '€' | node -pe "require('fs').readFileSync(0).toString().trim()"; $captured
Γé¼ # !! node's UTF-8 output was misinterpreted.
This currently requires the following workaround (in addition to requiring the console window to use a TrueType font (true by default on Windows 10)):
[console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
Prepend $OutputEncoding = to make a Windows PowerShell console fully UTF-8-aware.
The above implicitly switches to the UTF-8 code page (65001), as then reflected in chcp.
This obscure workaround shouldn't be necessary, and I think it would make sense for PowerShell to automatically set [console]::InputEncoding and [console]::OutputEncoding to (BOM-less) UTF-8 on startup.
Update: When this issue was originally created, there was no mechanism for presetting code page 65001 (UTF-8) system-wide, which necessitated the awkward workaround. In recent versions of Windows 10 it is now possible to switch to code page 65001 as the system locale and therefore system-wide, although as of Windows 10 version 1909 that feature is still in beta - see this SO answer.
- Caveat: In addition to defaulting the OEM code page to
65001 in all console windows (including cmd.exe windows), this invariably also makes Windows PowerShell's ANSI-encoding-default cmdlets default to UTF-8, notably Get-Content and Set-Content, which can be problematic from a backward-compatibility perspective.
Additionally, there is a bug - see below.
The change, which can also be made programmatically (see below), requires administrative privileges and a reboot.
Environment data
PowerShell Core 7.1.0-preview.3 on Windows 10
PowerShell Core now commendably defaults to UTF-8 encoding, including when sending strings to external programs, as reflected in
$OutputEncoding's default value.However, because the console-window shortcut file / taskbar entry still defaults to the OEM code page implied by the legacy system locale (e.g.
437on US-English systems), it misinterprets strings from external programs; e.g., with Node.js installed:This currently requires the following workaround (in addition to requiring the console window to use a TrueType font (true by default on Windows 10)):
Prepend
$OutputEncoding =to make a Windows PowerShell console fully UTF-8-aware.The above implicitly switches to the UTF-8 code page (
65001), as then reflected inchcp.This obscure workaround shouldn't be necessary, and I think it would make sense for PowerShell to automatically set
[console]::InputEncodingand[console]::OutputEncodingto (BOM-less) UTF-8 on startup.Update: When this issue was originally created, there was no mechanism for presetting code page
65001(UTF-8) system-wide, which necessitated the awkward workaround. In recent versions of Windows 10 it is now possible to switch to code page65001as the system locale and therefore system-wide, although as of Windows 10 version 1909 that feature is still in beta - see this SO answer.65001in all console windows (includingcmd.exewindows), this invariably also makes Windows PowerShell's ANSI-encoding-default cmdlets default to UTF-8, notablyGet-ContentandSet-Content, which can be problematic from a backward-compatibility perspective.Additionally, there is a bug - see below.
The change, which can also be made programmatically (see below), requires administrative privileges and a reboot.
Environment data