Who cares? It’s going to be hashed anyway. If the same user can generate the same input, it will result in the same hash. If another user can’t generate the same input, well, that’s really rather the point. And I can’t think of a single backend, language, or framework that doesn’t treat a single Unicode character as one character. Byte length of the character is irrelevant as long as you’re not doing something ridiculous like intentionally parsing your input in binary and blithely assuming that every character must be 8 bits in length.
It matters for bcrypt/scrypt. They have a 72 byte limit. Not characters, bytes.
That said, I also think it doesn’t matter much. Reasonable length passphrases that could be covered by the old Latin-1 charset can easily fit in that. If you’re talking about KJC languages, then each character is actually a whole word, and you’re packing a lot of entropy into one character. 72 bytes is already beyond what’s needed for security; it’s diminishing returns at that point.
Hmm. I wonder about this one. Different ways to encode the same character. Different ways to calculate the length. No obvious max byte size.
Who cares? It’s going to be hashed anyway. If the same user can generate the same input, it will result in the same hash. If another user can’t generate the same input, well, that’s really rather the point. And I can’t think of a single backend, language, or framework that doesn’t treat a single Unicode character as one character. Byte length of the character is irrelevant as long as you’re not doing something ridiculous like intentionally parsing your input in binary and blithely assuming that every character must be 8 bits in length.
It matters for bcrypt/scrypt. They have a 72 byte limit. Not characters, bytes.
That said, I also think it doesn’t matter much. Reasonable length passphrases that could be covered by the old Latin-1 charset can easily fit in that. If you’re talking about KJC languages, then each character is actually a whole word, and you’re packing a lot of entropy into one character. 72 bytes is already beyond what’s needed for security; it’s diminishing returns at that point.