UTF8

Question

UTF8

Noah Young

Attached: Photoshop_2019-02-26_12-13-44.png (1106x296, 50K)

February 26, 2019 - 20:06

Tyler Torres

anyone noticed the recent 9front patches?
Fellas have non-standard behavior of %.*s in printf where length is in UTF-8 codepoints and not in bytes.
It's so non-standard that they have been using it the other way all over the place and how they are putting O(n) length lookup.
Fucking retards.

February 26, 2019 - 20:10

Jackson Foster

I CAN'T BELIEVE IT ACTUALLY EXISTS FUCK

February 26, 2019 - 20:12

Ryder Davis

amazing

February 26, 2019 - 20:23

James Cook

⥠

February 26, 2019 - 21:04

Jonathan Fisher

Why would you use printf for bytes?

February 26, 2019 - 21:19

Ethan Gomez

>oh noes, UTF-16 with only 2 bytes are not enough to store all these pointless characters no one uses and gay emojis with nigger and pajeet variants
a slightly modified and less gay UTF-16 would be perfection
fuck the unicode consortium

February 26, 2019 - 21:23

Lincoln Cox

I was dealing with parsing an UTF-8 string recently and would keep track a of an offset and length values for whatever symbol I was dealing with, and using printf("%.*s", len, str+offset); made sense and it doesn't need to calculate pointless codepoints and shit

February 26, 2019 - 21:25

Cameron Powell

because all data bytes
the length is there so you don't have to use nul-termination (or defensively not use and rather do bounded access)
why would you have it there in utf-8 codepoints? codepoints don't reliably correspond to anything useful about the text

February 26, 2019 - 21:27

Alexander Fisher

There are a lot of shenanigans you can pull with unicode "homoglyphs" like that. For example, a lot of spam/abuse filters aren't smart enough to normalize (meaning replacing all 'a' looking characters to the ascii 'a'). Or, many systems will render URL's written in unicode differently based on how they normalize (or not) the strings before running URL detectors on it.

February 26, 2019 - 21:31

Cooper Butler

lmao what a nerd u'll never get any pusy

February 26, 2019 - 21:37

Dylan Harris

UTF-8 is horrible honestly, UTF-32 should be the standard.

February 26, 2019 - 21:39

Aaron Nelson

fuck you, you stole my catchphrase

February 26, 2019 - 21:51

Asher Howard

what does an encoding have to do with the unicode consortium adding pointless characters
do you even know what you're talking about

February 26, 2019 - 21:52

Dominic Murphy

UTF-8 is not your average encoding.

February 26, 2019 - 21:55

Bentley Clark

This is the turning point where I go from being angry at the chaos to enjoying it. Burn it all.

February 26, 2019 - 21:57

Blake Scott

that's why I said we should use the UTF-16 encoding but with sane codepoints not made by trannies and sjws

February 26, 2019 - 22:15

Jonathan Garcia

Good. That's how it should be and makes total sense in that context. You're determining how many lexical characters to print, nothing to do with the type char or (size in) bytes.

If you want to print 1 character, you want to print 1 character, regardless of how many bytes wide it is. ASCII dependent applications deserve to be broken.

February 26, 2019 - 22:17

Grayson Jones

⫸deleted
kek

February 26, 2019 - 22:25

Dylan Clark

no
it's for bounded memry access, thus offset in bytes
glyphs or encoded characters can be composed from multiple codepoints
specifying number of codepoints while knowing the length of buffer in bytes can cause buffer overflow or again O(n) lookup

February 26, 2019 - 22:35

Caleb Cox

also what you want with your claim is the %.*S extension (that works over array of glyphs)

February 26, 2019 - 22:37

Bentley Lopez

DELETE
THIS

February 26, 2019 - 22:42

1 2 3 Next

UTF8

Last threads