How to verify if a string is properly encoded in utf-8 ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to verify if a string is properly encoded in utf-8 ?

Krishna Srinivasan

Greetings.

A quick question. How can I verify if a given
string is properly encoded in utf-8 ?

The use case is this - I have a web form that
I send with the charset set to utf-8. But it
is possible that the user might change the
encoding and hence when the form gets submitted,
a different character might get back to me.

Thanks,
Krishna.

_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: How to verify if a string is properly encoded in utf-8 ?

Tung Wai Yip

try:
     unicodetext = bytestring.decode('utf-8')
     # it is very likely that bytestring is utf-8 encoded
except UnicodeDecodeException:
     # this is not UTF-8 encoded

UTF-8 is designed with redundancy. If you can decode it, it is very likely  
that the text stream is UTF-8 encoded.

Wai Yip

>
> Greetings.
>
> A quick question. How can I verify if a given
> string is properly encoded in utf-8 ?
>
> The use case is this - I have a web form that
> I send with the charset set to utf-8. But it
> is possible that the user might change the
> encoding and hence when the form gets submitted,
> a different character might get back to me.
>
> Thanks,
> Krishna.
>
> _______________________________________________
> Baypiggies mailing list
> [hidden email]
> http://mail.python.org/mailman/listinfo/baypiggies


_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies