Skip to content

More table formatting wonkiness#66

Merged
danielbachhuber merged 4 commits into
masterfrom
fix-66
Sep 10, 2014
Merged

More table formatting wonkiness#66
danielbachhuber merged 4 commits into
masterfrom
fix-66

Conversation

@danielbachhuber
Copy link
Copy Markdown
Member

Thar be bugs

image

@danielbachhuber
Copy link
Copy Markdown
Member Author

@szepeviktor any insights into what the root of these character encoding issues are?

@szepeviktor
Copy link
Copy Markdown

<?php echo mb_strlen("日本語", "UTF-8");
3

@szepeviktor
Copy link
Copy Markdown

Strange

<?php echo mb_detect_encoding("日本語");
UTF-8

$length = mb_strlen( $str, mb_detect_encoding( $str ) );

@szepeviktor
Copy link
Copy Markdown

Got it.
Technically it works fine, "日本語" is really 3 characters long but takes up 6 chars' width!
You can see above it: "It-al-ia" = 6

@szepeviktor
Copy link
Copy Markdown

Please use mb_strwidth()

@szepeviktor
Copy link
Copy Markdown

And $length = preg_match_all( '/.{1}/us', $str ); instead of iconv.

@danielbachhuber danielbachhuber added this to the next milestone Sep 9, 2014
Because `safe_strlen()` gives us the string length for output, we need the true length to determine how much we should pad the string
@danielbachhuber
Copy link
Copy Markdown
Member Author

It's actually a problem with safe_strpad(). I've worked out a fix, but it still doesn't work with Hebrew and Burmese:

image

@szepeviktor
Copy link
Copy Markdown

Burmese has multiple signs in one position, which is unclear to me.
This Hebrew writing has two (separate) accents (actually vowels) under the letters.
I think this is why those strings are calculated two characters more than the actual width - making padding narrower.
PHP i18n seems shallow. Java i18n too.

@szepeviktor
Copy link
Copy Markdown

You can strip out hebrew vowels: http://blog.shaftek.org/2005/06/03/removing-vowels-from-hebrew-unicode-text/ before padding it. More details

@szepeviktor
Copy link
Copy Markdown

What font do you use on your terminal?

@danielbachhuber
Copy link
Copy Markdown
Member Author

You can strip out hebrew vowels

Ugh. Could we do similar detection for Burmese?

What font do you use on your terminal?

Source Code Pro.

@szepeviktor
Copy link
Copy Markdown

Please do not support Burmese language.
I write the code when you find me a Burmese wp-cli user.

Hebrew writing has two separate accents / vowels under letters. In testing, all fonts properly handle this
@danielbachhuber
Copy link
Copy Markdown
Member Author

I write the code when you find me a Burmese wp-cli user.

It's a deal :) Thanks for your help with this.

@szepeviktor
Copy link
Copy Markdown

And $length = preg_match_all( '/.{1}/us', $str ); instead of iconv.

@danielbachhuber
Copy link
Copy Markdown
Member Author

This is why we're using iconv(). If we used preg_match_all(), we'd have to roll our own error notice. Is there something I'm missing?

danielbachhuber added a commit that referenced this pull request Sep 10, 2014
More table formatting wonkiness
@danielbachhuber danielbachhuber merged commit c35014e into master Sep 10, 2014
@danielbachhuber danielbachhuber deleted the fix-66 branch September 10, 2014 12:22
@szepeviktor
Copy link
Copy Markdown

if non-ascii encoding is present

In what case should preg_match_all return an error?
/.{1}/us does length measurement only.

@szepeviktor
Copy link
Copy Markdown

preg_match_all returns false on non-UNICODE input.

php -r 'var_export( preg_match_all( "/.{1}/us", "'$(echo -n óra|recode utf8..latin2)'") );'

@danielbachhuber
Copy link
Copy Markdown
Member Author

Ok. I think this is good enough for now. We can revisit later as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants