Discussion:
locale
(too old to reply)
Solar Designer
2015-10-21 18:07:32 UTC
Permalink
Raw Message
BTW, magnum, can we please get rid of the UTF-8 char for degrees? Don't
assume everyone has their terminal set to UTF-8 all the time, especially
as it's a totally unnecessary assumption here.
I made it configurable but it still defaults to UTF-8. I dislike the
idea of dropping it by default - users might not realize that "GPU:73C"
is a temp reading at all.
Maybe check the current locale and default to plain "C" if the current
locale is not UTF-8? To avoid checking env vars explicitly, maybe use
mbrtowc() and see what it returns for the UTF-8 character under the
current locale?
I'm now checking/setting locale (if autoconf says I can) and fall back
to skipping the degree sign. Let me know if it misbehaves.
This is:

https://github.com/magnumripper/JohnTheRipper/issues/1841
https://github.com/magnumripper/JohnTheRipper/commit/5acb98062d25efb319e9ac4dbd04555693b1d739

Looking at these changes, I realize that my idea was probably bad:
initializing the locale with setlocale() affects lots of things,
including the ctype macros. With some cracking modes, this might affect
what candidate passwords they generate. IIRC, we avoided using the
ctype macros in our wordlist rules engine, but now that I grep e.g. for
"islower", I find uses in dynamic_compiler.c, jumbo.c, mask.c.

While we might later choose to add initializing locale to JtR for other
reasons, I think DEGREE_SIGN alone isn't a sufficient reason, and if we
do add locale support, we should make it consistent: initialize it all
the time and do so early on, and not only do it for OpenCL and CUDA
formats like the current code does.

For now, maybe we should in fact check env vars explicitly to decide on
DEGREE_SIGN.

A maybe acceptable hack (for jumbo) is to do something like:

setlocale(LC_ALL, "");
... check for UTF-8 here ...
setlocale(LC_CTYPE, "C");

so that ctype macros are unaffected by the current locale (since our
uses of them appear to be of the kind where we prefer consistency over
customization; arguably, this means they are misuses). But we'll need
to do it all the time, and early on, to ensure consistent behavior
regardless of whether an OpenCL or CUDA format is run.

Also, the current checks for strchr(setlocale(LC_ALL, NULL), '.') do not
tell us whether the locale is UTF-8 or not. We'll need to do better.

Alexander
magnum
2015-10-21 19:50:25 UTC
Permalink
Raw Message
Post by Solar Designer
I'm now checking/setting locale (if autoconf says I can) and fall back
to skipping the degree sign. Let me know if it misbehaves.
initializing the locale with setlocale() affects lots of things,
including the ctype macros. With some cracking modes, this might affect
what candidate passwords they generate. IIRC, we avoided using the
ctype macros in our wordlist rules engine, but now that I grep e.g. for
"islower", I find uses in dynamic_compiler.c, jumbo.c, mask.c.
I wasn't aware of these uses and we should replace them. Actually, the
one in mask.c is kind of correct: It's for case-toggling the base word
in hybrid mode, and just being able to do so with ASCII is a limitation.
But we must honor our encoding options, not the terminal locale.
Post by Solar Designer
While we might later choose to add initializing locale to JtR for other
reasons, I think DEGREE_SIGN alone isn't a sufficient reason, and if we
do add locale support, we should make it consistent: initialize it all
the time and do so early on, and not only do it for OpenCL and CUDA
formats like the current code does.
I agree that introducing a locale for the degree sign alone is overkill.
I was just moving slowly: I actually had some vague idea that the
arguable UTF-8 defaults (just the parts that affect screen output, in
particular the "AlwaysReportUTF8 = Y") could be made depending on
locale. But maybe we should back away from setlocale instead, at least
for now.
Post by Solar Designer
For now, maybe we should in fact check env vars explicitly to decide on
DEGREE_SIGN.
setlocale(LC_ALL, "");
... check for UTF-8 here ...
setlocale(LC_CTYPE, "C");
so that ctype macros are unaffected by the current locale (since our
uses of them appear to be of the kind where we prefer consistency over
customization; arguably, this means they are misuses). But we'll need
to do it all the time, and early on, to ensure consistent behavior
regardless of whether an OpenCL or CUDA format is run.
Also, the current checks for strchr(setlocale(LC_ALL, NULL), '.') do not
tell us whether the locale is UTF-8 or not. We'll need to do better.
The current implementation is not limited to UTF-8, it will also get you
a proper degree sign for legacy codepages like ISO-8859-*, CP* or
KOI8-R. For this to work I can't reset it back to C, and checking for
UTF-8 is irrelevant (the current check for '.' is mostly a check for
'neither "C" nor "POSIX" but some complete "aa_BB.CCCC" setting').

Anyway, you point out potential problems I did not realize. I think I'll
just drop the use of setlocale for now but I'll sleep on it.

magnum

Loading...