python implementation of unicode collation algorithm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

python implementation of unicode collation algorithm

James Tauber

I've made a start on a pure python implementation of the Unicode  
Collation Algorithm (UTS #10) but I thought I'd best check with this  
SIG whether such a thing already exists.

James
--
James Tauber                       http://jtauber.com/
journeyman of some   http://jtauber.com/blog/


_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: python implementation of unicode collation algorithm

M.-A. Lemburg
James Tauber wrote:
> I've made a start on a pure python implementation of the Unicode  
> Collation Algorithm (UTS #10) but I thought I'd best check with this  
> SIG whether such a thing already exists.

Not that I'm aware of.

Note that given the sizes of the collation tables, it's probably
better to have them defined in a C module, rather than a Python
data structure.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 23 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: python implementation of unicode collation algorithm

Jim Fulton
In reply to this post by James Tauber
James Tauber wrote:
> I've made a start on a pure python implementation of the Unicode  
> Collation Algorithm (UTS #10) but I thought I'd best check with this  
> SIG whether such a thing already exists.

I'm not aware of any pure python implementations.

I've created a pyrex-based C wrapper of the ICU collation library at:

   http://svn.zope.org/zope.ucol/trunk/

You don't need pyrex to use this and there is a distutils
setup script to install it.

I'd be happy to make an official release of this if anyone is
interested.

There is also a SWIG-based C++ wrapper of a much larger portion of the
ICU library, including collation at:

   http://pyicu.osafoundation.org/

This requires swig, hand editing of makefiles, and dynamic-library
machinations, which is in large part why I ended up writing my own
wrapper.

Jim

--
Jim Fulton           mailto:[hidden email]       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org
_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: python implementation of unicode collation algorithm

James Tauber
In reply to this post by M.-A. Lemburg
On 23/01/2006, at 4:49 AM, M.-A. Lemburg wrote:

> James Tauber wrote:
>> I've made a start on a pure python implementation of the Unicode
>> Collation Algorithm (UTS #10) but I thought I'd best check with this
>> SIG whether such a thing already exists.
>
> Not that I'm aware of.
>
> Note that given the sizes of the collation tables, it's probably
> better to have them defined in a C module, rather than a Python
> data structure.

Yes, this is certainly true of the DUCET, although for language-
specific collation element tables, it would be more manageable.

I'll probably start with a pure Python implementation and then take  
it from there (or let someone with better C extension experience  
optimize it)

James

_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig
Reply | Threaded
Open this post in threaded view
|

Re: python implementation of unicode collation algorithm

James Tauber
In reply to this post by Jim Fulton
On 23/01/2006, at 6:37 AM, Jim Fulton wrote:

> James Tauber wrote:
>> I've made a start on a pure python implementation of the Unicode  
>> Collation Algorithm (UTS #10) but I thought I'd best check with  
>> this  SIG whether such a thing already exists.
>
> I'm not aware of any pure python implementations.
>
> I've created a pyrex-based C wrapper of the ICU collation library at:
>
>   http://svn.zope.org/zope.ucol/trunk/
>
> You don't need pyrex to use this and there is a distutils
> setup script to install it.
>
> I'd be happy to make an official release of this if anyone is
> interested.

I'd like to see an official release, even if I do end up doing a pure  
Python implementation myself.

James
--
James Tauber                       http://jtauber.com/
journeyman of some   http://jtauber.com/blog/




_______________________________________________
I18n-sig mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/i18n-sig