Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Georg Brandl-2
On 19.04.2012 03:36, ezio.melotti wrote:
> http://hg.python.org/cpython/rev/36c901fcfcda
> changeset:   76413:36c901fcfcda
> branch:      2.7
> user:        Ezio Melotti <[hidden email]>
> date:        Wed Apr 18 19:08:41 2012 -0600
> summary:
>   #14538: HTMLParser can now parse correctly start tags that contain a bare /.

> diff --git a/Misc/NEWS b/Misc/NEWS
> --- a/Misc/NEWS
> +++ b/Misc/NEWS
> @@ -50,6 +50,9 @@
>  Library
>  -------
>  
> +- Issue #14538: HTMLParser can now parse correctly start tags that contain
> +  a bare '/'.
> +

I think that's misleading: there's no way to "correctly" parse malformed HTML.

Georg

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Benjamin Peterson-3
2012/4/24 Georg Brandl <[hidden email]>:

> On 19.04.2012 03:36, ezio.melotti wrote:
>> http://hg.python.org/cpython/rev/36c901fcfcda
>> changeset:   76413:36c901fcfcda
>> branch:      2.7
>> user:        Ezio Melotti <[hidden email]>
>> date:        Wed Apr 18 19:08:41 2012 -0600
>> summary:
>>   #14538: HTMLParser can now parse correctly start tags that contain a bare /.
>
>> diff --git a/Misc/NEWS b/Misc/NEWS
>> --- a/Misc/NEWS
>> +++ b/Misc/NEWS
>> @@ -50,6 +50,9 @@
>>  Library
>>  -------
>>
>> +- Issue #14538: HTMLParser can now parse correctly start tags that contain
>> +  a bare '/'.
>> +
>
> I think that's misleading: there's no way to "correctly" parse malformed HTML.

There is in the since that you can follow the HTML5 algorithm, which
can "parse" any junk you throw at it.



--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Fred Drake-3
On Tue, Apr 24, 2012 at 2:34 PM, Benjamin Peterson <[hidden email]> wrote:
> There is in the since that you can follow the HTML5 algorithm, which
> can "parse" any junk you throw at it.

This whole can of worms is why I gave up on HTML years ago (well, one
reason among many).

There are markup languages, and there's soup.


  -Fred

--
Fred L. Drake, Jr.    <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
   --Samuel Langhorne Clemens
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Georg Brandl-2
In reply to this post by Benjamin Peterson-3
On 24.04.2012 20:34, Benjamin Peterson wrote:

> 2012/4/24 Georg Brandl <[hidden email]>:
>> On 19.04.2012 03:36, ezio.melotti wrote:
>>> http://hg.python.org/cpython/rev/36c901fcfcda
>>> changeset:   76413:36c901fcfcda
>>> branch:      2.7
>>> user:        Ezio Melotti <[hidden email]>
>>> date:        Wed Apr 18 19:08:41 2012 -0600
>>> summary:
>>>   #14538: HTMLParser can now parse correctly start tags that contain a bare /.
>>
>>> diff --git a/Misc/NEWS b/Misc/NEWS
>>> --- a/Misc/NEWS
>>> +++ b/Misc/NEWS
>>> @@ -50,6 +50,9 @@
>>>  Library
>>>  -------
>>>
>>> +- Issue #14538: HTMLParser can now parse correctly start tags that contain
>>> +  a bare '/'.
>>> +
>>
>> I think that's misleading: there's no way to "correctly" parse malformed HTML.
>
> There is in the since that you can follow the HTML5 algorithm, which
> can "parse" any junk you throw at it.

Ah, good. Then I hope we are following the algorithm here (and are slowly
coming to use it for htmllib in general).

Georg

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Benjamin Peterson-3
In reply to this post by Benjamin Peterson-3
2012/4/24 Benjamin Peterson <[hidden email]>:
> There is in the since

This is confusing, since I meant "sense".


--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Éric Araujo
In reply to this post by Georg Brandl-2
Le 24/04/2012 15:02, Georg Brandl a écrit :
> On 24.04.2012 20:34, Benjamin Peterson wrote:
>> 2012/4/24 Georg Brandl<[hidden email]>:
>>> I think that's misleading: there's no way to "correctly" parse malformed HTML.
>> There is in the since that you can follow the HTML5 algorithm, which
>> can "parse" any junk you throw at it.
> Ah, good. Then I hope we are following the algorithm here (and are slowly
> coming to use it for htmllib in general).

Yes, Ezio’s commits on html.parser/HTMLParser in the last months have
been following the HTML5 spec.  Ezio, RDM and I have had some discussion
about that on some bug reports, IRC and private mail and reached the
agreement to do the useful thing, that is follow HTML5 and not pretend
that the stdlib parser is strict or validating.

Ezio was thinking about a blog.python.org post to advertise this.

Regards
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Brian Curtin
On Tue, Apr 24, 2012 at 14:34, Éric Araujo <[hidden email]> wrote:

> Le 24/04/2012 15:02, Georg Brandl a écrit :
>>
>> On 24.04.2012 20:34, Benjamin Peterson wrote:
>>>
>>> 2012/4/24 Georg Brandl<[hidden email]>:
>>>>
>>>> I think that's misleading: there's no way to "correctly" parse malformed
>>>> HTML.
>>>
>>> There is in the since that you can follow the HTML5 algorithm, which
>>> can "parse" any junk you throw at it.
>>
>> Ah, good. Then I hope we are following the algorithm here (and are slowly
>> coming to use it for htmllib in general).
>
>
> Yes, Ezio’s commits on html.parser/HTMLParser in the last months have been
> following the HTML5 spec.  Ezio, RDM and I have had some discussion about
> that on some bug reports, IRC and private mail and reached the agreement to
> do the useful thing, that is follow HTML5 and not pretend that the stdlib
> parser is strict or validating.
>
> Ezio was thinking about a blog.python.org post to advertise this.

Please do this, and I welcome anyone else who wants to write about
their work on the blog to do so. Contact me for info.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com