Skip to content

Conversation

@Raudius
Copy link
Contributor

@Raudius Raudius commented May 11, 2022

Summary

Previously we would just assume files would use UTF-8. Now we make sure to convert files to UTF-8 when we open them. Without configuration just a small set of encodings are supported:

 'UTF-8', 'GB2312', 'GBK', 'BIG-5', 'SJIS-win', 'EUC-JP', 'Windows-1252', 'ISO-8859-15', 'ISO-8859-1', 'ASCII'

More encodings can be configured via the PHP config value: mbstring.detect_order

@Raudius Raudius force-pushed the enh/support_encodings branch from 7f9ffe4 to 69c3236 Compare May 11, 2022 14:11
@Raudius Raudius requested review from a team, max-nextcloud and vinicius73 and removed request for a team May 11, 2022 14:12
@Raudius Raudius force-pushed the enh/support_encodings branch 6 times, most recently from de3fb09 to 4a725a3 Compare May 12, 2022 15:30
@vinicius73
Copy link
Member

As we are using mbstring functions, it is important to define it in composer.json
Although it is already required by server.

https://github.com/nextcloud/server/blob/9e1d4d17ec792d8abcec2f9262472f5bb7a43acf/composer.json#L21-L29

--- a/composer.json
+++ b/composer.json
@@ -17,7 +17,8 @@
         }
     ],
     "require": {
-      "php": "^7.3|^8.0"
+      "php": "^7.3|^8.0",
+      "ext-mbstring": "*"
     },
     "scripts": {

What do you think?

@Raudius Raudius force-pushed the enh/support_encodings branch 2 times, most recently from 207a4da to 2919658 Compare May 13, 2022 10:37
@juliusknorr
Copy link
Member

@Raudius Could you do a rebase to update the branch rather then merge? That would make the commit history linear and a bit cleaner :)

@juliusknorr
Copy link
Member

Found two more reportings on other encoding issues when doing a bit of triage:
#1633
#824

@Raudius Raudius force-pushed the enh/support_encodings branch 5 times, most recently from 5299d57 to ff8f19a Compare May 16, 2022 16:45
Comment on lines 47 to 58
$detected_encoding = $this->encodingService->detectEncoding(file_get_contents($file));
if ($this->isSupportedEncoding($encoding)) {
$this->assertNotNull($detected_encoding);
}

$original_order = mb_detect_order();
$this->assertNotFalse(mb_detect_order($encoding));

$detected_encoding = $this->encodingService->detectEncoding(file_get_contents($file));
$this->assertEquals($encoding, $detected_encoding);

mb_detect_order($original_order);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick on the code style 😉 I guess we currently don't run php-cs-fixer on the tests directory.

Suggested change
$detected_encoding = $this->encodingService->detectEncoding(file_get_contents($file));
if ($this->isSupportedEncoding($encoding)) {
$this->assertNotNull($detected_encoding);
}
$original_order = mb_detect_order();
$this->assertNotFalse(mb_detect_order($encoding));
$detected_encoding = $this->encodingService->detectEncoding(file_get_contents($file));
$this->assertEquals($encoding, $detected_encoding);
mb_detect_order($original_order);
$detectedEncoding = $this->encodingService->detectEncoding(file_get_contents($file));
if ($this->isSupportedEncoding($encoding)) {
$this->assertNotNull($detectedEncoding);
}
$originalOrder = mb_detect_order();
$this->assertNotFalse(mb_detect_order($encoding));
$detectedEncoding = $this->encodingService->detectEncoding(file_get_contents($file));
$this->assertEquals($encoding, $detectedEncoding);
mb_detect_order($originalOrder);

@Raudius Raudius force-pushed the enh/support_encodings branch from ff8f19a to 749cc0a Compare May 16, 2022 16:49
Copy link
Member

@juliusknorr juliusknorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor code style comment left on the tests, otherwise LGTM

@Raudius Raudius force-pushed the enh/support_encodings branch 2 times, most recently from cdd653a to 86a2c38 Compare May 17, 2022 06:54
@Raudius Raudius force-pushed the enh/support_encodings branch from 86a2c38 to 42a448c Compare May 17, 2022 07:12
@juliusknorr juliusknorr merged commit c72b9e1 into master May 17, 2022
@delete-merged-branch delete-merged-branch bot deleted the enh/support_encodings branch May 17, 2022 07:51
@juliusknorr
Copy link
Member

/backport to stable24

@juliusknorr
Copy link
Member

/backport to stable23

@juliusknorr
Copy link
Member

/backport to stable22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backported successfully backported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't show Chinese word corrlecty, Maybe it's because GB2312 and GBk encoding is not supported

5 participants