I am trying to find a working, maintained DTD parser written in Python. The
main Python XML distribution does not include one. I was using the PyXML
parser, but (1) (at least on SourceForge) it hasn't been maintained for
years, and (2) (as of the last version I could find, 0.8.4) it has a bug
that sometimes causes it to not process the very last line of a DTD -- which
in a well-modularized DTD is often an entity reference that pulls in the
main content of the DTD!
I've written my own DTD parser, but it omits some features, I really don't
know how well it conforms to the spec (which is important because I'm also
writing some DTDs of my own), and I'd much rather use a well-tested one
written by others.
On Sat, Apr 3, 2010 at 9:45 AM, L Peter Deutsch <[hidden email]> wrote:
> I was using the PyXML parser, but (1) (at least on SourceForge)
> it hasn't been maintained for years,
Nor anywhere else; the sourceforge site is "current", as it goes.
> and (2) (as of the last version I could find, 0.8.4) it has a bug
> that sometimes causes it to not process the very last line of a DTD -- which
> in a well-modularized DTD is often an entity reference that pulls in the
> main content of the DTD!
Are you attempting to only parse the DTD itself, or validate a
document according to the DTD?
If the former, xml.parsers.expat can be made to serve this purpose.
I'm not sure what people are using for DTD-based document validation
in Python these days.
> Are you attempting to only parse the DTD itself, or validate a document
> according to the DTD?
Only the former.
> If the former, xml.parsers.expat can be made to serve this purpose.
Thanks very much. I just now read the xml.parsers.expat documentation, and
I see the callbacks that correspond to the callbacks from the PyXML DTD
parser. I think I was using the PyXML parser because that's what Stefan
Behnel's dtd2py used, which was the starting point for my current project;
but in any case, I see no reason to use it any longer. Sorry to have taken
your time for this.
I'm sorry to impose on you, but I've had a very frustrating afternoon trying
to report a couple of Python expat bugs on the PSF bug tracker. SourceForge
appears to have lost all of my account data (for the second time), and when
I tried to register separately on the bug tracker site, the registration
process said "An unexpected error occurred during the processing of your
message" and failed to complete.
I'm using the Ubuntu Linux 8.04 distribution, which includes Python 2.5.2.
The libexpat1 version is 2.0.1-0ubuntu1.1 (hardy-updates), but I don't know
whether Python uses this or includes its own copy of expat.
The smaller problem -- but one that still led me to waste a fair bit of time
-- is that the SetParamEntityParsing method of xmlparser objects is simply
missing from the documentation of xml.parsers.expat. Unfortunately, the
default is to not parse parameter entities, even when reading external DTDs,
so calling this method is required for DTDs that use parameter entities. I
finally discovered this method by going to the expat Web site and look at
the C API.
I checked the Python doc for 2.6.5, and this method is still missing.
The larger problem is that often (but not always) when the Parse() method of
xmlparser returns after completely parsing a file, something happens at the
implementation level that results in a completely bogus Python error
"TypeError: An integer is required." The error may occur a few Python
statements later, which suggests to me that it is a memory bookkeeping
problem of some kind, but I have no idea how to track it down. However, it
is totally repeatable, and I can provide a very simple example (a 220-line
Python driver, most of which isn't executed, and a 7-line DTD) that triggers
I checked the PSF bug tracker, and I thought that this might be the same bug
as # 6676, but my test case doesn't call ParseFile more than once on the
same parser instance.
I would really prefer not to upgrade to a later Python version, especially
not to 2.6 or later, but if this bug has been fixed, I'm willing to consider
As soon as the registration issue gets cleared up, I'll report these issues
properly, but meanwhile, I was wondering if either of them (especially the
execution error, which has me stalled right now) rings a bell with anyone.