Quantcast

my own entity defs when parsing with etree?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

my own entity defs when parsing with etree?

Stuart McGraw-2
Hello,

I could use some really basic help about using Etree.
I have tried reading the etree and expat doc but I
don't understand most of it.

I have an xml file that contains a dtd that defines a
number of entities that are subsequently referenced
in the xml.

What I would like to do:

1) Parse the xml file but override some or all of the
entity definitions in the dtd with my own definitions.

2) Parse strings containing elements extracted from
the full xml file, without the dtd, and supplying my
own entity map to resolve any entities.

I am nearly clueless when it comes to xml processesing
so if I could get a code snippet illustrating how to
do the above, that would be wonderful!  I am currently
using the stock Python 2.6 elementTree, but could
switch to lxml's if that would help.
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: my own entity defs when parsing with etree?

Stefan Behnel-3

Stuart McGraw wrote:
> I could use some really basic help about using Etree.
> I have tried reading the etree and expat doc but I
> don't understand most of it.

In that case, you should read up on XML in general first. The Wikipedia
article isn't all that bad:

http://en.wikipedia.org/wiki/XML


> I have an xml file that contains a dtd that defines a
> number of entities that are subsequently referenced
> in the xml.
>
> What I would like to do:
>
> 1) Parse the xml file but override some or all of the
> entity definitions in the dtd with my own definitions.
>
> 2) Parse strings containing elements extracted from
> the full xml file, without the dtd, and supplying my
> own entity map to resolve any entities.

http://effbot.org/elementtree/elementtree-xmlparser.htm#tag-ET.XMLParser.entity


> I am nearly clueless when it comes to xml processesing
> so if I could get a code snippet illustrating how to
> do the above, that would be wonderful!  I am currently
> using the stock Python 2.6 elementTree, but could
> switch to lxml's if that would help.

ElementTree (i.e. the xml.etree package) does not supports DTDs at all. If
you want to use DTDs, e.g. to do validation, to inject default attributes,
or to resolve entity references, you can switch to the external lxml.etree
package. Note, however, that lxml does not support the ".entity" dictionary
on parsers. It doesn't currently have a way to override entity definitions
outside of a DTD.

Stefan
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: my own entity defs when parsing with etree?

Josh English
I gave up on Entities ages ago, but thought I'd try it after seeing your link.

I tried this simple code:

from elementtree import ElementTree as ET

p = ET.XMLParser()

p.entity["me"] = "Josh"

text = """<test>&me;</test>"""

p.feed(text)

e = p.close()

print e
ET.dump(e)

And got an error:

>pythonw -u "ETParserWithEntities.py"
Traceback (most recent call last):
  File "ETParserWithEntities.py", line 9, in <module>
    p.feed(text)
  File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
line 1524, in feed
    self._raiseerror(v)
  File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
line 1426, in _raiseerror
    raise err
elementtree.ElementTree.ParseError: undefined entity: line 1, column 6
>Exit code: 1


As far as I can tell, the XMLParser is using pyexpat, which only comes
as a .pyd file, so I can't look into this.

Any ideas?

Windows XP, Python 2.6, elementtree 1v3a2

Josh English



--
Josh English
[hidden email]
http://joshenglish.livejournal.com
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: my own entity defs when parsing with etree?

Stefan Behnel-3
Hi,

Josh English wrote:

> I gave up on Entities ages ago, but thought I'd try it after seeing your link.
>
> I tried this simple code:
>
> from elementtree import ElementTree as ET
>
> p = ET.XMLParser()
>
> p.entity["me"] = "Josh"
>
> text = """<test>&me;</test>"""
>
> p.feed(text)
>
> e = p.close()
>
> print e
> ET.dump(e)
>
> And got an error:
>
>> pythonw -u "ETParserWithEntities.py"
> Traceback (most recent call last):
>   File "ETParserWithEntities.py", line 9, in <module>
>     p.feed(text)
>   File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
> line 1524, in feed
>     self._raiseerror(v)
>   File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
> line 1426, in _raiseerror
>     raise err
> elementtree.ElementTree.ParseError: undefined entity: line 1, column 6
>> Exit code: 1

Interesting. I just tried and got the same result. I guess I never even
tried to do this, given that I knew lxml won't support it anyway...

Without debugging into this, it seems that expat raises that exception
before ElementTree even gets to handle the unknown entity.

I just found this post, but didn't try it:

http://mail.python.org/pipermail/python-list/2007-April/607256.html

Stefan
_______________________________________________
XML-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/xml-sig
Loading...