Quantcast

Is anyone implementing EXI in Python?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Is anyone implementing EXI in Python?

Stanley A. Klein-2
Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
provides a format for efficiently representing XML documents with
schema-informed and schema-less modes.

There is an open-source Java implementation available.

Is anyone working to implement EXI in Python?


Stan Klein





_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is anyone implementing EXI in Python?

Henry S. Thompson
Stanley A. Klein writes:

> Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
> provides a format for efficiently representing XML documents with
> schema-informed and schema-less modes.
>
> There is an open-source Java implementation available.
>
> Is anyone working to implement EXI in Python?

Don't get me wrong, I think EXI is useful, in the right places, but,
could I ask, why would you want to implement it in Python?  I'd be
very surprised if any Python XML application is spending anything like
enough time in the raw parsing activity (as opposed to the
structure-building activity) to make the marginal gain you might get
from EXI worth it. . .

EXI is, IMO, for closely coupled systems in particular messaging
environments where every bit counts, and I guess I'm having difficulty
imagining Python in such a context. . .

ht
--
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: [hidden email]
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is anyone implementing EXI in Python?

Stanley A. Klein-2
EXI is for data interchange.  That can mean messaging or document/data
storage.  SOAP messages are very verbose, and SOAP messaging can benefit
from EXI, especially if the communications channels have bandwidth or
transit time considerations.  SOAP is increasingly being considered in a
variety of control system applications for which Python makes sense as an
implementation language.  Similarly, scientific applications involving
large amounts of XML-formatted data could benefit from EXI in storing the
data or interchanging it for purposes such as grid processing.

The original application that contributed the technology for EXI was
sending web pages to cell phones.

In general, any applications implemented in Python that involves messaging
or data storage with either bandwidth or storage volume concerns could
benefit from EXI.  And as best I know there are a growing number of such
applications implemented in Python.

Also, why would Java make sense and Python not?


Stan Klein




On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote:

> Stanley A. Klein writes:
>
>> Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
>> provides a format for efficiently representing XML documents with
>> schema-informed and schema-less modes.
>>
>> There is an open-source Java implementation available.
>>
>> Is anyone working to implement EXI in Python?
>
> Don't get me wrong, I think EXI is useful, in the right places, but,
> could I ask, why would you want to implement it in Python?  I'd be
> very surprised if any Python XML application is spending anything like
> enough time in the raw parsing activity (as opposed to the
> structure-building activity) to make the marginal gain you might get
> from EXI worth it. . .
>
> EXI is, IMO, for closely coupled systems in particular messaging
> environments where every bit counts, and I guess I'm having difficulty
> imagining Python in such a context. . .
>
> ht
> --
>        Henry S. Thompson, School of Informatics, University of Edinburgh
>                          Half-time member of W3C Team
>       10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
>                 Fax: (44) 131 651-1426, e-mail: [hidden email]
>                        URL: http://www.ltg.ed.ac.uk/~ht/
> [mail really from me _always_ has this .sig -- mail without it is forged
> spam]
>


--


_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is anyone implementing EXI in Python?

Stefan Behnel-3
Hi,

Stanley A. Klein wrote:

> On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote:
>> Stanley A. Klein writes:
>>
>>> Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
>>> provides a format for efficiently representing XML documents with
>>> schema-informed and schema-less modes.
>>>
>>> There is an open-source Java implementation available.
>>>
>>> Is anyone working to implement EXI in Python?
>>
>> Don't get me wrong, I think EXI is useful, in the right places, but,
>> could I ask, why would you want to implement it in Python?  I'd be
>> very surprised if any Python XML application is spending anything like
>> enough time in the raw parsing activity (as opposed to the
>> structure-building activity) to make the marginal gain you might get
>> from EXI worth it. . .
>>
>> EXI is, IMO, for closely coupled systems in particular messaging
>> environments where every bit counts, and I guess I'm having difficulty
>> imagining Python in such a context. . .
>
> EXI is for data interchange.  That can mean messaging or document/data
> storage.  SOAP messages are very verbose, and SOAP messaging can benefit
> from EXI, especially if the communications channels have bandwidth or
> transit time considerations.
>
> SOAP is increasingly being considered in a
> variety of control system applications for which Python makes sense as an
> implementation language.  Similarly, scientific applications involving
> large amounts of XML-formatted data could benefit from EXI in storing the
> data or interchanging it for purposes such as grid processing.
>
> The original application that contributed the technology for EXI was
> sending web pages to cell phones.
>
> In general, any applications implemented in Python that involves
> messaging
> or data storage with either bandwidth or storage volume concerns could
> benefit from EXI.  And as best I know there are a growing number of such
> applications implemented in Python.

Any XML transmission or storage can benefit from *compression*, often
shrinking the data volume by factors up to 100. I doubt that the savings of
EXI are sufficiently large compared to a well compressed XML stream that
they compensate for the drawbacks of yet another new non-readable format.

A well chosen compression method is a lot better suited to such
applications and is already supported by most available XML parsers (or
rather outside of the parsers themselves, which is a huge advantage).


> Also, why would Java make sense and Python not?

Because pretty much all XML technologies come from the Java environment?
That doesn't mean that Java is a suitable language for working with them.
It only means that it supports them because Java is used for developing
them (often as a reference implementation).

Stefan
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is anyone implementing EXI in Python?

Stanley A. Klein-2
On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
> Hi,
>
> Stanley A. Klein wrote:
> > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote:
> >> Stanley A. Klein writes:
> >>
> >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C.
It
> >>> provides a format for efficiently representing XML documents with
schema-informed and schema-less modes.
> >>>
> >>> There is an open-source Java implementation available.
> >>>
> >>> Is anyone working to implement EXI in Python?
> >>
> >> Don't get me wrong, I think EXI is useful, in the right places, but,
could I ask, why would you want to implement it in Python?  I'd be
very surprised if any Python XML application is spending anything
like
> >> enough time in the raw parsing activity (as opposed to the
> >> structure-building activity) to make the marginal gain you might get
from EXI worth it. . .
> >>
> >> EXI is, IMO, for closely coupled systems in particular messaging
environments where every bit counts, and I guess I'm having
difficulty
> >> imagining Python in such a context. . .
> >
> > EXI is for data interchange.  That can mean messaging or document/data
storage.  SOAP messages are very verbose, and SOAP messaging can
benefit
> > from EXI, especially if the communications channels have bandwidth or
transit time considerations.
> >
> > SOAP is increasingly being considered in a
> > variety of control system applications for which Python makes sense as
an
> > implementation language.  Similarly, scientific applications involving
large amounts of XML-formatted data could benefit from EXI in storing
the
> > data or interchanging it for purposes such as grid processing.
> >
> > The original application that contributed the technology for EXI was
sending web pages to cell phones.
> >
> > In general, any applications implemented in Python that involves
messaging
> > or data storage with either bandwidth or storage volume concerns could
benefit from EXI.  And as best I know there are a growing number of
such
> > applications implemented in Python.
>
> Any XML transmission or storage can benefit from *compression*, often
shrinking the data volume by factors up to 100. I doubt that the savings
of EXI are sufficiently large compared to a well compressed XML stream
that they compensate for the drawbacks of yet another new non-readable
format.
>
> A well chosen compression method is a lot better suited to such
> applications and is already supported by most available XML parsers (or
rather outside of the parsers themselves, which is a huge advantage).
>
>
> > Also, why would Java make sense and Python not?
>
> Because pretty much all XML technologies come from the Java environment?
That doesn't mean that Java is a suitable language for working with
them.
> It only means that it supports them because Java is used for developing
them (often as a reference implementation).
>
> Stefan


It depends on the nature of the XML application.  One feature of EXI is to
support representation of numeric data as bits rather than characters.
That is very useful in appropriate applications.  There is a measurements
document that shows the compression that was achieved on a wide variety of
test cases.  Straight use of a common compression algorithm does not
necessarily achieve the best results.  Besides, EXI incorporates elements
of common compression algorithm(s) as both a fallback for its schema-less
mode and an additional capability in its schema-informed mode.

EXI is intended for use outboard of the parser, and that would apply
equally well to a Python version.  For example, EXI gets rid of the need
to constantly resend over-the-wire all the namespace definitions with each
message.  The relevant strings would just go into the string table and get
restored from there when the message is converted back.

However, for something like SOAP in certain applications, it may be
eventually desirable to integrate the EXI implementation within the
communications system.  The message sender could reasonably create a
schema-informed EXI version without actually starting from and converting
an XML object.  The recipient would have to convert the EXI back to XML,
parse it, and use the data.

Regarding the format readability, it converts to XML and is readable
there.  Numeric data is most efficiently sent as bits, so that data is
necessarily unreadable until converted.  The value of EXI necessarily
depends on the application.


Stan Klein


_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig

untitled-2 (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is anyone implementing EXI in Python?

Stefan Behnel-3
Hi,

Stanley A. Klein wrote:
> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
>> A well chosen compression method is a lot better suited to such
>> applications and is already supported by most available XML parsers (or
>> rather outside of the parsers themselves, which is a huge advantage).
>
> It depends on the nature of the XML application.  One feature of EXI is to
> support representation of numeric data as bits rather than characters.
> That is very useful in appropriate applications.

One drawback is that this requires a schema to make sure the number of bits
is sufficient. Otherwise, you'd need to add the information how many bits
you use for their representation, which would add to the data volume.


> There is a measurements
> document that shows the compression that was achieved on a wide variety of
> test cases.  Straight use of a common compression algorithm does not
> necessarily achieve the best results.

Repetitive data like an XML byte stream compresses extremely well, though,
and the 'best' compression isn't always required anyway. I worked on a
Python SOAP application where we sent some 3MB of XML as a web service
response. That took a couple of seconds to transmit. Injecting the standard
gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more
to do here.

If you need 'the best' compression, there's no way around benchmarking a
couple of different algorithms that are suitable for your application, and
choosing the one that works best for your data. That may or may not include
EXI.


> Besides, EXI incorporates elements
> of common compression algorithm(s) as both a fallback for its schema-less
> mode and an additional capability in its schema-informed mode.

Makes sense, as compression also applies to text content, for example.


> EXI is intended for use outboard of the parser, and that would apply
> equally well to a Python version.  For example, EXI gets rid of the need
> to constantly resend over-the-wire all the namespace definitions with each
> message.  The relevant strings would just go into the string table and get
> restored from there when the message is converted back.

That's how any run-length based compression algorithm works anyway. Plus,
namespace definitions usually only happen once in a document, so they are
pretty much negligible in a larger XML document.


> However, for something like SOAP in certain applications, it may be
> eventually desirable to integrate the EXI implementation within the
> communications system.  The message sender could reasonably create a
> schema-informed EXI version without actually starting from and converting
> an XML object.  The recipient would have to convert the EXI back to XML,
> parse it, and use the data.

Ok, that's where I see it, too. At the level where you'd normally apply a
compression algorithm anyway.


> Numeric data is most efficiently sent as bits

Depends on how you select the bits. When I write into my schema that I use
a 32 bit integer value in my XML, and all I really send happens to be
within [0-9] in, say, 95% of the cases with a few exceptions that really
require 32 bits, a general run-length compression algorithm will easily
beat anything that sends the value as a 4-byte sequence. That's the
advantage of general compression: it sees the real data, not only its schema.

I do not question EXI in general, I'm fine with it having its niche
(wherever that turns out to be). I'm just saying that common compression
algorithms are a lot more broadly available and achieve similar results. So
EXI is just another way of compressing XML, with the disadvantage of not
being as widely implemented. Compare it to the ubiquity of the gzip
compression algorithm, for example. It's just the usual trade-off that you
make between efficiency and cross-platform compatibility.

Stefan
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is anyone implementing EXI in Python?

Stanley A. Klein-2
I think the issue here is the nature of the data exchange.  EXI
essentially provides a compression algorithm that saves information
between instances of a message or file and can be seeded with what is
known in advance about certain characteristics of the instances.  The gzip
algorithm learns the characteristics of each instance separately from that
instance and does not retain information between instances.

If you are occasionally sending a large file, gzip makes sense.  There is
little gain from retaining information.  However, if you have frequent
small messages or separate small files based on a schema, the namespace
definitions are repeated for each instance and can take up an appreciable
fraction of what is sent over-the-wire for each instance.  There isn't
much for gzip to learn, and it has to start all over for the next
instance.  Similarly, the tags recur across instances but gzip will only
learn them as it encounters them in a particular instance.  Again, gzip
forgets between instances.

I think in the absence of prior information and when used only
occasionally (without information retention between instances), EXI
provides something close to gzip compression.  What EXI provides is a
variant of compression technology that has information retention between
instances and the ability to use prior information across instances.  In
applications with frequent repetitive data exchanges, the information
retention and ability to use prior information can provide significant
benefits.


Stan Klein


On Fri, July 17, 2009 4:06 am, Stefan Behnel wrote:

> Hi,
>
> Stanley A. Klein wrote:
>> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
>>> A well chosen compression method is a lot better suited to such
>>> applications and is already supported by most available XML parsers (or
>>> rather outside of the parsers themselves, which is a huge advantage).
>>
>> It depends on the nature of the XML application.  One feature of EXI is
>> to
>> support representation of numeric data as bits rather than characters.
>> That is very useful in appropriate applications.
>
> One drawback is that this requires a schema to make sure the number of
> bits
> is sufficient. Otherwise, you'd need to add the information how many bits
> you use for their representation, which would add to the data volume.
>
>
>> There is a measurements
>> document that shows the compression that was achieved on a wide variety
>> of
>> test cases.  Straight use of a common compression algorithm does not
>> necessarily achieve the best results.
>
> Repetitive data like an XML byte stream compresses extremely well, though,
> and the 'best' compression isn't always required anyway. I worked on a
> Python SOAP application where we sent some 3MB of XML as a web service
> response. That took a couple of seconds to transmit. Injecting the
> standard
> gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more
> to do here.
>
> If you need 'the best' compression, there's no way around benchmarking a
> couple of different algorithms that are suitable for your application, and
> choosing the one that works best for your data. That may or may not
> include
> EXI.
>
>
>> Besides, EXI incorporates elements
>> of common compression algorithm(s) as both a fallback for its
>> schema-less
>> mode and an additional capability in its schema-informed mode.
>
> Makes sense, as compression also applies to text content, for example.
>
>
>> EXI is intended for use outboard of the parser, and that would apply
>> equally well to a Python version.  For example, EXI gets rid of the need
>> to constantly resend over-the-wire all the namespace definitions with
>> each
>> message.  The relevant strings would just go into the string table and
>> get
>> restored from there when the message is converted back.
>
> That's how any run-length based compression algorithm works anyway. Plus,
> namespace definitions usually only happen once in a document, so they are
> pretty much negligible in a larger XML document.
>
>
>> However, for something like SOAP in certain applications, it may be
>> eventually desirable to integrate the EXI implementation within the
>> communications system.  The message sender could reasonably create a
>> schema-informed EXI version without actually starting from and
>> converting
>> an XML object.  The recipient would have to convert the EXI back to XML,
>> parse it, and use the data.
>
> Ok, that's where I see it, too. At the level where you'd normally apply a
> compression algorithm anyway.
>
>
>> Numeric data is most efficiently sent as bits
>
> Depends on how you select the bits. When I write into my schema that I use
> a 32 bit integer value in my XML, and all I really send happens to be
> within [0-9] in, say, 95% of the cases with a few exceptions that really
> require 32 bits, a general run-length compression algorithm will easily
> beat anything that sends the value as a 4-byte sequence. That's the
> advantage of general compression: it sees the real data, not only its
> schema.
>
> I do not question EXI in general, I'm fine with it having its niche
> (wherever that turns out to be). I'm just saying that common compression
> algorithms are a lot more broadly available and achieve similar results.
> So
> EXI is just another way of compressing XML, with the disadvantage of not
> being as widely implemented. Compare it to the ubiquity of the gzip
> compression algorithm, for example. It's just the usual trade-off that you
> make between efficiency and cross-platform compatibility.
>
> Stefan
>


--


_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Loading...