[Django] #30686: Truncator.chars splits HTML entities

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[Django] #30686: Truncator.chars splits HTML entities

Django
#30686: Truncator.chars splits HTML entities
--------------------------------------+------------------------
               Reporter:  tdhooper    |          Owner:  nobody
                   Type:  Bug         |         Status:  new
              Component:  Utilities   |        Version:  2.2
               Severity:  Normal      |       Keywords:
           Triage Stage:  Unreviewed  |      Has patch:  0
    Needs documentation:  0           |    Needs tests:  0
Patch needs improvement:  0           |  Easy pickings:  0
                  UI/UX:  0           |
--------------------------------------+------------------------
 I'm using Truncator to truncate wikis, and it sometimes truncates in the
 middle of &quot; entities, resulting in '<p>some text &qu</p>'

--
Ticket URL: <https://code.djangoproject.com/ticket/30686>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/051.67ad6069d95a9566bf5a8759ce9d48bc%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
#30686: Truncator.chars splits HTML entities
---------------------------+--------------------------------------
     Reporter:  tdhooper   |                    Owner:  nobody
         Type:  Bug        |                   Status:  new
    Component:  Utilities  |                  Version:  2.2
     Severity:  Normal     |               Resolution:
     Keywords:             |             Triage Stage:  Unreviewed
    Has patch:  0          |      Needs documentation:  0
  Needs tests:  0          |  Patch needs improvement:  0
Easy pickings:  0          |                    UI/UX:  0
---------------------------+--------------------------------------
Description changed by tdhooper:

Old description:

> I'm using Truncator to truncate wikis, and it sometimes truncates in the
> middle of &quot; entities, resulting in '<p>some text &qu</p>'

New description:

 I'm using Truncator.chars to truncate wikis, and it sometimes truncates in
 the middle of &quot; entities, resulting in '<p>some text &qu</p>'

--

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.89fdd81dad695489da92df00047ec0ce%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
In reply to this post by Django
#30686: Truncator.chars splits HTML entities
-------------------------------+--------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  2.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------

Comment (by Carlton Gibson):

 Hi Thomas. Any chance of an example string (hopefully minimal) that
 creates the behaviour so we can have a look?

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.26c09c09ecc674123497f0b780249f49%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
In reply to this post by Django
#30686: Truncator.chars splits HTML entities
-------------------------------+--------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  2.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------

Comment (by Florian Apolloner):

 I think now that the security release are out let's just add bleach as
 dependency on master and be done with it?

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.b082d7722660267cb48432eed8817877%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
In reply to this post by Django
#30686: Truncator.chars splits HTML entities
-------------------------------+--------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  2.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------

Comment (by Thomas Hooper):

 Here's an example https://repl.it/@tdhooper/Django-truncate-entities-bug

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.825ec8813c1058053a81d42e7dfd12e9%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
In reply to this post by Django
#30686: Truncator.chars splits HTML entities
-------------------------------+--------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  2.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------

Comment (by Florian Apolloner):

 btw I confused `truncator` with `strip_tags`. So in this case the answer
 would be to rewrite the parser using `html5lib`, while `split_tags` would
 use `bleach` which in turn then uses `html5lib` as well.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.a97fbbd634dfbddc9d508d5e63063d07%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
In reply to this post by Django
#30686: Truncator.chars splits HTML entities
-------------------------------+--------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  2.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------

Comment (by Thomas Hooper):

 Looks like it can be fixed with this regex change
 https://github.com/django/django/pull/11633/files

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:6>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.e319e88186b760533a233e6cc3983b98%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Truncator.chars splits HTML entities

Django
In reply to this post by Django
#30686: Truncator.chars splits HTML entities
-------------------------------+--------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  2.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------
Changes (by Carlton Gibson):

 * Attachment "possible-html5lib-truncator-implementation.patch" added.

 Example implemetation of _truncate_html() using html5lib, by Florian
 Apolloner

--
Ticket URL: <https://code.djangoproject.com/ticket/30686>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.778f58a4023603ab59cdc9abcb1c5995%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser. (was: Truncator.chars splits HTML entities)

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------
Changes (by Carlton Gibson):

 * version:  2.2 => master
 * stage:  Unreviewed => Accepted


Old description:

> I'm using Truncator.chars to truncate wikis, and it sometimes truncates
> in the middle of &quot; entities, resulting in '<p>some text &qu</p>'

New description:

 Original description:

 > I'm using Truncator.chars to truncate wikis, and it sometimes truncates
 in the middle of &quot; entities, resulting in '<p>some text &qu</p>'

 This is a limitation of the regex based implementation (which has had
 security issues, and presents an intractable problem).

 Better to move to use a HTML parser, for Truncate, and strip_tags(), via
 html5lib and bleach.

--

Comment:

 Right, good news is this isn't a regression from
 7f65974f8219729c047fbbf8cd5cc9d80faefe77.

 * The new example case fails on v2.2.3 &co.
 * The suggestion for the regex change is in the part not changed as part
 of 7f65974f8219729c047fbbf8cd5cc9d80faefe77. (Which is why the new case
 fails, I suppose :)

 I don't want to accept a tweaking of the regex here. Rather, we should
 move to using `html5lib` as Florian suggests.
 Possibly this would entail small changes in behaviour around edge cases,
 to be called out in release notes, but
 would be a big win overall.

 This has previously been discussed by the Security Team as the required
 way forward.
 I've updated the title/description and will Accept accordingly.

 I've attached an initial WIP patch by Florian of an `html5lib`
 implementation of the core `_truncate_html()` method.

 An implementation of `strip_tags()` using `bleach` would go something
 like:

 {{{
 bleach.clean(text, tags=[], strip=True, strip_comments=True)
 }}}



 Thomas, would taking on making changes like these be something you'd be
 willing/keen to do? If so, I'm very happy to input to assist in any way.
 :)

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.365d1302d9426793bc68a55f94756a7f%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Thomas Hooper):

 Hi Carlton, that would be fun, but this is bigger than I have time for
 now. It looks like you all have it in hand.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:8>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.47ce7d2e2784e70d7d2822cfb4afc9f5%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Claude Paroz):

 Do we want to make both html5lib and bleach required dependencies of
 Django?
 html5lib latest release is now 20 months ago, and when I read issues like
 https://github.com/html5lib/html5lib-python/issues/419 without any
 maintainer feedback, I'm a bit worried. What about the security report
 workflow for those libs? What if a security issue is discovered in html5
 lib and the maintainers are unresponsive? Sorry to sound a bit negative,
 but I think those questions must be asked.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:9>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.d603d6b28da9782f448c72f0e8e723d7%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Carlton Gibson):

 Yep Claude, absolutely.

 I think there's two difficulties we could face:

 * trying to successfully sanitize HTML with regexes.
 * (Help) Make sure html5lib-python is maintained.

 The first of these is intractable. The second not. 🙂

 I've put out some feelers to try and find out more.

 * This is pressing for Python and pip **now**, not for us for a while yet.
 * If we look at https://github.com/html5lib/html5lib-python/issues/361 it
 seems there's some money on the table from tidelift potentially.
 * We COULD allocate some time in a pinch I think.
 * AND it's **just** a wrapper around the underlying C library, so whilst
 20 months seems a long time, I'm not sure the release cadence is really an
 issue.

 BUT, yes, absolutely. Let's hammer this out properly before we commit. 👍
 I will open a mailing list thread when I know more.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:10>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.473885d8f73b399665516e6e426002f5%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Carlton Gibson):

 > AND it's just (even with the emphasis, cough) a wrapper around the
 underlying C library, so whilst 20 months seems a long time, I'm not sure
 the release cadence is really an issue.

 OK, that last one isn't at all true. (Looking at the source it's the
 entire implementation.)

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:11>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.44993918c07dcd29cdd4b56b6101920f%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Claude Paroz):

 To be clear, I'm also convinced parsing is more reliable than regexes. I
 just think we have to double-think before adding a dependency, because as
 the name implies, we depend on it and therefore we must be able to trust
 its maintainers. Some guarantees about the security process and serious
 bugs fixing should be obtained. Without that, we are just outsourcing
 problems.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:12>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.44f4794a79a91260cd88a2d06d34fbee%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Carlton Gibson):

 @Claude: 💯👍 Totally agree.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:13>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.fdec846b25826d62b494a8977f2b8964%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Carlton Gibson):

 Duplicate in #30700, with [https://github.com/django/django/pull/11660
 failing test case provided].

 I've tried contacting maintainers of HTML5lib with no success.

 I've re-opened https://github.com/django/django/pull/11633 (original regex
 based suggestion) so we can at least assess it as a possible stop-gap.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:14>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.7dfcc5d75cd5b9f5b1c825e677daef9e%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------
Changes (by Carlton Gibson):

 * cc: Jon Dufresne (added)


Comment:

 Paging Jon, to ask his opinion on this.

 Hey Jon, I see you've made a number of PRs to both html5lib, and bleach.

 To me, at this point, html5lib essentially looks unmaintained. I don't
 have personal capacity to give to it, as cool as it is as a project.
 Arguably we (Fellows) could allocate it _some_ time, since we spend a fair
 bit already messing around with regexes but that would be small, and we
 couldn't take it on whole, so can I ask your thoughts?

 Is html5lib in trouble? If so, as a user, what are your plans, if any? And
 from that, what do you think about Django adopting it? What's the
 alternative?

 Thanks for the thought and insight.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:15>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.aa893e8f5706cde01b3654566f92705e%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Jon Dufresne):

 > To me, at this point, html5lib essentially looks unmaintained.

 I agree with this observation. The previous main maintainer looks to have
 stopped working on the project. Responses to issues and PRs have stopped.

 > Is html5lib in trouble? If so, as a user, what are your plans, if any?
 And from that, what do you think about Django adopting it? What's the
 alternative?

 For my own projects, I'll probably continue using html5lib until its
 staleness creates an observable bug for me. I haven't hit that point yet.

 Bleach, on the other hand, looks like maintenance has slowed, but not
 stopped. I believe they have vendored html5lib to allow them to make
 changes internally. FWIW, I also still use Bleach.

 ---

 I'm not familiar with all the details of this ticket, but would the stdlib
 HTML parser be sufficient?

 https://docs.python.org/3/library/html.parser.html

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:16>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.b93513dceae4c751e6d6dee3d345a209%40djangoproject.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django
In reply to this post by Django
#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  Utilities      |                  Version:  master
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by Carlton Gibson):

 Hi Jon,

 Thank you for the comments. I will email Will, the maintainer of Bleach,
 and ask his thoughts too. Bleach has slowed down, but that's because it's
 Stable/Mature now I would have thought.

 > ...would the stdlib HTML parser be sufficient?

 Yes. Maybe. Ideally we just thought to bring in Bleach, and with it
 html5lib since, in theory that's already working code. (Florian already
 had a Truncate prototype...)

 Anyhow... will follow-up.

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:17>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.f6d3e3f7f85ac7da8e40a678c98515f8%40djangoproject.com.