Quantcast

ugettext charset

Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ugettext charset

My Th
Hi!

I'm using Python 2.6.5 and gettext. Currently ugettext() and ungettext()
doesn't respect 'codeset' setting and return only ASCII encoded strings.
Is it by design or is it a bug? It seems that in issue tracker there is
no issue about this. And as it is now it contradicts documentation,
which says: "If provided, codeset will change the charset used to encode
translated strings".

This breaks some things, because, ASCII encoded unicode strings are not
considered equivalent to unicode strings in different encodings even if
they contain exactly the same characters. And unicode() function by
default returns ASCII encoded strings. In this case it should get an
argument for encoding.


Cheers,
Reinis


_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ugettext charset

"Martin v. Löwis"

> I'm using Python 2.6.5 and gettext. Currently ugettext() and ungettext()
> doesn't respect 'codeset' setting

Of course not. It returns Unicode strings instead.

> and return only ASCII encoded strings.

I can't reproduce that. It certainly returns non-ASCII strings.

> Is it by design or is it a bug?

I think you misinterpret what you are seeing (although it's not really
clear what it is that you are seeing). AFAICT, the current behavior is
by design.

> This breaks some things, because, ASCII encoded unicode strings

This doesn't make sense. Unicode strings *cannot* be ASCII-encoded.
They are always Unicode-encoded - that's why they are called unicode
strings.

> are not
> considered equivalent to unicode strings in different encodings even if
> they contain exactly the same characters.

Unicode strings don't have different encodings. They are encoded in
Unicode.

> And unicode() function by
> default returns ASCII encoded strings. In this case it should get an
> argument for encoding.

The call to unicode only applies to the msgid, not the translation.
This should be safe, since the msgid will only contain ASCII characters.

Regards,
Martin
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ugettext charset

My Th
Thanks, Martin!

I understood were my issue is.. I'm mixing Unicode strings with 8-bit
strings. The later ones are equivalent to ASCII if they don't contain
any higher codepoints, but if they do then they can not be translated to
Unicode using ASCII encoding (default), then encoding has to be given.

I was basically doing something like this (where 'a' comes from
gettext):
a = unicode('ā', encoding='utf-8')
b = unicode('ā', encoding='utf-8').encode('utf-8')
a+b
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call
last)

<ipython console> in <module>()

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

But 'b' is not a Unicode string anymore after encode().. that should be
called only before writing to the file.


Cheers,
Reinis

S , 2010-06-26 20:37 +0200, "Martin v. Löwis" rakstīja:

> > I'm using Python 2.6.5 and gettext. Currently ugettext() and ungettext()
> > doesn't respect 'codeset' setting
>
> Of course not. It returns Unicode strings instead.
>
> > and return only ASCII encoded strings.
>
> I can't reproduce that. It certainly returns non-ASCII strings.
>
> > Is it by design or is it a bug?
>
> I think you misinterpret what you are seeing (although it's not really
> clear what it is that you are seeing). AFAICT, the current behavior is
> by design.
>
> > This breaks some things, because, ASCII encoded unicode strings
>
> This doesn't make sense. Unicode strings *cannot* be ASCII-encoded.
> They are always Unicode-encoded - that's why they are called unicode
> strings.
>
> > are not
> > considered equivalent to unicode strings in different encodings even if
> > they contain exactly the same characters.
>
> Unicode strings don't have different encodings. They are encoded in
> Unicode.
>
> > And unicode() function by
> > default returns ASCII encoded strings. In this case it should get an
> > argument for encoding.
>
> The call to unicode only applies to the msgid, not the translation.
> This should be safe, since the msgid will only contain ASCII characters.
>
> Regards,
> Martin


_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Loading...