Skip to content

Conversation

@jiasli
Copy link
Member

@jiasli jiasli commented May 7, 2021

Context

Reported by Azure/azure-cli#17994

On Windows with English as the system language (without UTC-8 enabled), the system encoding by default is cp1252 (Western Europe), and Python will use cp1252 as the file encoding by default.

Writing Unicode characters like 汉字 to log file results in error:

Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\logging\__init__.py", line 1084, in emit
    stream.write(msg + self.terminator)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 87-88: character maps to <undefined>

The error can also be easily reproduced with

with open("test.txt", "w") as f:
    print(f.encoding)
    f.write("汉字")
cp1252
Traceback (most recent call last):
  File "D:/cli/testproj/test1.py", line 2, in <module>
    f.write("汉字")
  File "C:\Users\jiasli\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>

Change

This PR forces log files to use UTF-8 so that file logging works on non-UTF-8 systems as well.

Alternative solution

One may also follow #178 to change the default encoding of the system to UTF-8.

@jiasli
Copy link
Member Author

jiasli commented May 7, 2021

If you want to test, simply enable file logging with az config set logging.enable_log_file=true and write some Chinese charaters to the log:

logger.warning("汉字")

@jiasli jiasli requested review from avanigupta, evelyn-ys and yonzhan May 7, 2021 04:54
Copy link
Collaborator

@yonzhan yonzhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiasli jiasli merged commit f377ca8 into microsoft:dev May 8, 2021
@jiasli jiasli deleted the encoding branch May 8, 2021 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants