Support for Devanagari Script

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Support for Devanagari Script

poudyal
Hi,
I have recently discovered the power of Python.  I started by trying to implement a Sanskrit transliteration translation program.  I did accomplish it but the Unicode Devanagari script is not displaying as I expect on the python interpreter output lines.  The same sequence of unicode does render as expected if I write it to an html file and open it with a web browser. 
 
The attached code does both, I cannot figure out if I am doing something wrong, or not setting up the fonts correctly in python, or python does not fully implement the unicode standard (for this script). 
 
I hope this is the right group to ask the question.  Thanks for any help.
 
vjktm
 
 


Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1¢/min.
#Unicode Example #vjktm # kSa=u'Hello \u0915\u094D\u0937' fout = open('ex2.html','w') head = """ Example 2

Example 2

""" foot = '' fout.write(head) fout.write(kSa.encode('UTF-8')) fout.write(foot) fout.close() print kSa
_______________________________________________
I18n-sig mailing list
I18n-sig@...
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

John Machin
On 8/09/2006 11:06 AM, Vijaya Poudyal wrote:

> Hi,
> I have recently discovered the power of Python.  I started by trying to
> implement a Sanskrit transliteration translation program.  I did
> accomplish it but the Unicode Devanagari script is not displaying as I
> expect on the python interpreter output lines.  The same sequence of
> unicode does render as expected if I write it to an html file and open
> it with a web browser.
>  
> The attached code does both, I cannot figure out if I am doing something
> wrong, or not setting up the fonts correctly in python, or python does
> not fully implement the unicode standard (for this script).
>  
> I hope this is the right group to ask the question.  Thanks for any help.
>  
> vjktm
>  

It's not that much to do with Python. The concept of "setting up the
fonts ... in Python" is rather novel -- what do you mean?

The main determining factor is whether the stdout can render the
bytestream that's thrown at it, and that depends on where you are
running your script. For example, on Windows, IDLE renders your UTF16
exactly the same as Firefox, Opera and IE6 render the UTF8 in the
created ex2.html. However running the script at the (DOS) command prompt
  will throw an exception (unless there's a Devanagari DOS codepage).

[Aside: the result from IDLE and the browsers appears (to someone
knowing very little about how characters combine in Indic scripts) as
one character which looks nothing like the 1st & 3rd input characters --
presumably that is expected(?)]

You will need to give more details about your environment.

I know little abouut Unix or Linux, but I'd expect better results from
throwing utf8 at the stdout, rather than utf16 -- have you tried
    print kSa.encode('utf_8')
?

HTH
Cheers
John
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

Andy Robinson-2
> The main determining factor is whether the stdout can render the
> bytestream that's thrown at it, and that depends on where you are
> running your script. For example, on Windows, IDLE renders your UTF16
> exactly the same as Firefox, Opera and IE6 render the UTF8 in the
> created ex2.html. However running the script at the (DOS) command prompt
>   will throw an exception (unless there's a Devanagari DOS codepage).

Regrettably not all fonts have the character set you want and the DOS
prompt is not a smart enough display device. However, browsers and IDLE
are smart enough to switch to a 'fallback font' for characters they
cannot display.

In Idle, which uses Courier (300kb on Windows), I get;

 >>>  print kSa.encode('UTF-8')

works fine.

 >>> print kSa.encode('UTF-16')

prints rubbish.

 >>> print kSa
works too but is almost certainly converting to utf8.

 From a DOS prompt, the UTF8 version prints rubbish.  The command prompt
font properties only give me two font choices, 'Raster Fonts' and
'Lucida Console'. When I switch IDLE to a variety of different fonts, I
still get the Devanagari character, IN THE SAME TYPEFACE, whichever font
I choose.

Conclusion:  DOS prompt does not have the display routines needed to
handle Unicode output.

- Andy Robinson


_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

poudyal
In reply to this post by John Machin
Hi John,
Thank you for the suggestions.
I am working on Windows.  I did try encoding to UTF-8 but that did not help.
 
I am new to Python, and thought that there may be a way to change the fonts used to display those characters.  The reason I wanted to try a different font is that if the fonts do not contain the glyphs corresponding to the correct ligature then the characters will not render as expected.  BTW, I also tried writing a Label to a Tkinter window and that did not work either, I got the same sequence of two glyphs instead of a single glyph.
 
The IDLE rendering is allowed only if the correct glyph is not available in the font.
I think it may also occur if consonant clusters are not handled correctly (I don't know what part of the code does this after I use the print statement) as per the Unicode standards for Devanagari.)
The IE rendering is required if the correct glyph does exist.
 
Thanks for the suggestions.
vjktm

John Machin <[hidden email]> wrote:
On 8/09/2006 11:06 AM, Vijaya Poudyal wrote:

> Hi,
> I have recently discovered the power of Python. I started by trying to
> implement a Sanskrit transliteration translation program. I did
> accomplish it but the Unicode Devanagari script is not displaying as I
> expect on the python interpreter output lines. The same sequence of
> unicode does render as expected if I write it to an html file and open
> it with a web browser.
>
> The attached code does both, I cannot figure out if I am doing something
> wrong, or not setting up the fonts correctly in python, or python does
> not fully implement the unicode standard (for this script).
>
> I hope this is the right group to ask the question. Thanks for any help.
>
> vjktm
>

It's not that much to do with Python. The concept of "setting up the
fonts ... in Python" is rather novel -- what do you mean?

The main determining factor is whether the stdout can render the
bytestream that's thrown at it, and that depends on where you are
running your script. For example, on Windows, IDLE renders your UTF16
exactly the same as Firefox, Opera and IE6 render the UTF8 in the
created ex2.html. However running the script at the (DOS) command prompt
will throw an exception (unless there's a Devanagari DOS codepage).

[Aside: the result from IDLE and the browsers appears (to someone
knowing very little about how characters combine in Indic scripts) as
one character which looks nothing like the 1st & 3rd input characters --
presumably that is expected(?)]

You will need to give more details about your environment.

I know little abouut Unix or Linux, but I'd expect better results from
throwing utf8 at the stdout, rather than utf16 -- have you tried
print kSa.encode('utf_8')
?

HTH
Cheers
John


Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

poudyal
In reply to this post by Andy Robinson-2
Andy,
Thanks for investigating this.  I am using IDLE in windows 2000 and XP.
 
I will try the variations you tried and will report the results.
 
Thanks.
vjktm

Andy Robinson <[hidden email]> wrote:
> The main determining factor is whether the stdout can render the
> bytestream that's thrown at it, and that depends on where you are
> running your script. For example, on Windows, IDLE renders your UTF16
> exactly the same as Firefox, Opera and IE6 render the UTF8 in the
> created ex2.html. However running the script at the (DOS) command prompt
> will throw an exception (unless there's a Devanagari DOS codepage).

Regrettably not all fonts have the character set you want and the DOS
prompt is not a smart enough display device. However, browsers and IDLE
are smart enough to switch to a 'fallback font' for characters they
cannot display.

In Idle, which uses Courier (300kb on Windows), I get;

>>> print kSa.encode('UTF-8')

works fine.

>>> print kSa.encode('UTF-16')

prints rubbish.

>>> print kSa
works too but is almost certainly converting to utf8.

From a DOS prompt, the UTF8 version prints rubbish. The command prompt
font properties only give me two font choices, 'Raster Fonts' and
'Lucida Console'. When I switch IDLE to a variety of different fonts, I
still get the Devanagari character, IN THE SAME TYPEFACE, whichever font
I choose.

Conclusion: DOS prompt does not have the display routines needed to
handle Unicode output.

- Andy Robinson




Want to be your own boss? Learn how on Yahoo! Small Business.
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

poudyal
In reply to this post by Andy Robinson-2
Hi Andy,
I tried my example at the command prompt in XP and Xbash (cygwin).  Python cannot decode the unicode in both cases.  In IDLE I get the wrong rendering.  I was not able to the "renders ... exactly the same ..." behavior you mentioned.
 
When you say kSa.encode('utf8') do you get "Hello" followed by two characters joined along the top or just one character (as in the html)?  When I run it I get two characters and this is a wrong rendering of the code point sequence.
 
vjktm

Andy Robinson <[hidden email]> wrote:
> The main determining factor is whether the stdout can render the l
> bytestream that's thrown at it, and that depends on where you are
> running your script. For example, on Windows, IDLE renders your UTF16
> exactly the same as Firefox, Opera and IE6 render the UTF8 in the
> created ex2.html. However running the script at the (DOS) command prompt
> will throw an exception (unless there's a Devanagari DOS codepage).

Regrettably not all fonts have the character set you want and the DOS
prompt is not a smart enough display device. However, browsers and IDLE
are smart enough to switch to a 'fallback font' for characters they
cannot display.

In Idle, which uses Courier (300kb on Windows), I get;

>>> print kSa.encode('UTF-8')

works fine.

>>> print kSa.encode('UTF-16')

prints rubbish.

>>> print kSa
works too but is almost certainly converting to utf8.

From a DOS prompt, the UTF8 version prints rubbish. The command prompt
font properties only give me two font choices, 'Raster Fonts' and
'Lucida Console'. When I switch IDLE to a variety of different fonts, I
still get the Devanagari character, IN THE SAME TYPEFACE, whichever font
I choose.

Conclusion: DOS prompt does not have the display routines needed to
handle Unicode output.

- Andy Robinson




Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

Andy Robinson-2
Vijaya Poudyal wrote:

> Hi Andy,
> I tried my example at the command prompt in XP and Xbash (cygwin).  
> Python cannot decode the unicode in both cases.  In IDLE I get the wrong
> rendering.  I was not able to the "renders ... exactly the same ..."
> behavior you mentioned.
>  
> When you say kSa.encode('utf8') do you get "Hello" followed by two
> characters joined along the top or just one character (as in the html)?  
> When I run it I get two characters and this is a wrong rendering of the
> code point sequence.
I get exactly the same thing in both, attached. I called this 'one
character' in my ignorance, but I guess the characters get combined.
Strangely, if I move the cursor through it with right-arrow key, it
"explodes" into 3 characters while it has the focus - these are the same
ones in the unicode standards sheets for those bytes.  But normally it
appears as above.

Best Regards,


Andy

_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig

dev.png (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for Devanagari Script

poudyal
Andy,
The attached image is the desired, correct rendition.  So there is hope for my project! Now why do you think I don't get it in IDLE (Win XP, Win 200).  Any suggestions on what I should investigate to fix this?
 
The Tk "Python Shell" window shows:
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32
IDLE 1.1.3  
Thanks,
vjktm

Andy Robinson <[hidden email]> wrote:
Vijaya Poudyal wrote:

> Hi Andy,
> I tried my example at the command prompt in XP and Xbash (cygwin).
> Python cannot decode the unicode in both cases. In IDLE I get the wrong
> rendering. I was not able to the "renders ... exactly the same ..."
> behavior you mentioned.
>
> When you say kSa.encode('utf8') do you get "Hello" followed by two
> characters joined along the top or just one character (as in the html)?
> When I run it I get two characters and this is a wrong rendering of the
> code point sequence.

I get exactly the same thing in both, attached. I called this 'one
character' in my ignorance, but I guess the characters get combined.
Strangely, if I move the cursor through it with right-arrow key, it
"explodes" into 3 characters while it has the focus - these are the same
ones in the unicode standards sheets for those bytes. But normally it
appears as above.

Best Regards,


Andy


All-new Yahoo! Mail - Fire up a more powerful email and get things done faster.
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig