Should we move to replace re with regex?

classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Should we move to replace re with regex?

Guido van Rossum
I just made a pass of all the Unicode-related bugs filed by Tom
Christiansen, and found that in several, the response was "this is
fixed in the regex module [by Matthew Barnett]". I started replying
that I thought that we should fix the bugs in the re module (i.e.,
really in _sre.c) but on second thought I wonder if maybe regex is
mature enough to replace re in Python 3.3. It would mean that we won't
fix any of these bugs in earlier Python versions, but I could live
with that.

However, I don't know much about regex -- how compatible is it, how
fast is it (including extreme cases where the backtracking goes
crazy), how bug-free is it, and so on. Plus, how much work would it be
to actually incorporate it into CPython as a complete drop-in
replacement of the re package (such that nobody needs to change their
imports or the flags they pass to the re module).

We'd also probably have to train some core developers to be familiar
enough with the code to maintain and evolve it -- I assume we can't
just volunteer Matthew to do so forever... :-)

What's the alternative? Is adding the requested bug fixes and new
features to _sre.c really that hard?

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

M.-A. Lemburg
Guido van Rossum wrote:

> I just made a pass of all the Unicode-related bugs filed by Tom
> Christiansen, and found that in several, the response was "this is
> fixed in the regex module [by Matthew Barnett]". I started replying
> that I thought that we should fix the bugs in the re module (i.e.,
> really in _sre.c) but on second thought I wonder if maybe regex is
> mature enough to replace re in Python 3.3. It would mean that we won't
> fix any of these bugs in earlier Python versions, but I could live
> with that.
>
> However, I don't know much about regex -- how compatible is it, how
> fast is it (including extreme cases where the backtracking goes
> crazy), how bug-free is it, and so on. Plus, how much work would it be
> to actually incorporate it into CPython as a complete drop-in
> replacement of the re package (such that nobody needs to change their
> imports or the flags they pass to the re module).
>
> We'd also probably have to train some core developers to be familiar
> enough with the code to maintain and evolve it -- I assume we can't
> just volunteer Matthew to do so forever... :-)
>
> What's the alternative? Is adding the requested bug fixes and new
> features to _sre.c really that hard?

Why not simply add the new lib, see whether it works out and
then decide which path to follow.

We've done that with the old regex lib. It took a few years
and releases to have people port their applications to the
then new re module and syntax, but in the end it worked.

With a new regex library there are likely going to be quite
a few subtle differences between re and regex - even if it's
just doing things in a more Unicode compatible way.

I don't think anyone can actually list all the differences given
the complex nature of regular expressions, so people will
likely need a few years and releases to get used it before
a switch can be made.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                38 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Guido van Rossum
On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg <[hidden email]> wrote:

> Guido van Rossum wrote:
>> I just made a pass of all the Unicode-related bugs filed by Tom
>> Christiansen, and found that in several, the response was "this is
>> fixed in the regex module [by Matthew Barnett]". I started replying
>> that I thought that we should fix the bugs in the re module (i.e.,
>> really in _sre.c) but on second thought I wonder if maybe regex is
>> mature enough to replace re in Python 3.3. It would mean that we won't
>> fix any of these bugs in earlier Python versions, but I could live
>> with that.
>>
>> However, I don't know much about regex -- how compatible is it, how
>> fast is it (including extreme cases where the backtracking goes
>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>> to actually incorporate it into CPython as a complete drop-in
>> replacement of the re package (such that nobody needs to change their
>> imports or the flags they pass to the re module).
>>
>> We'd also probably have to train some core developers to be familiar
>> enough with the code to maintain and evolve it -- I assume we can't
>> just volunteer Matthew to do so forever... :-)
>>
>> What's the alternative? Is adding the requested bug fixes and new
>> features to _sre.c really that hard?
>
> Why not simply add the new lib, see whether it works out and
> then decide which path to follow.
>
> We've done that with the old regex lib. It took a few years
> and releases to have people port their applications to the
> then new re module and syntax, but in the end it worked.
>
> With a new regex library there are likely going to be quite
> a few subtle differences between re and regex - even if it's
> just doing things in a more Unicode compatible way.
>
> I don't think anyone can actually list all the differences given
> the complex nature of regular expressions, so people will
> likely need a few years and releases to get used it before
> a switch can be made.

I can't say I liked how that transition was handled last time around.
I really don't want to have to tell people "Oh, that bug is fixed but
you have to use regex instead of re" and then a few years later have
to tell them "Oh, we're deprecating regex, you should just use re".

I'm really hoping someone has more actual technical understanding of
re vs. regex and can give us some facts about the differences, rather
than, frankly, FUD.

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Antoine Pitrou
On Fri, 26 Aug 2011 15:18:35 -0700
Guido van Rossum <[hidden email]> wrote:
>
> I can't say I liked how that transition was handled last time around.
> I really don't want to have to tell people "Oh, that bug is fixed but
> you have to use regex instead of re" and then a few years later have
> to tell them "Oh, we're deprecating regex, you should just use re".
>
> I'm really hoping someone has more actual technical understanding of
> re vs. regex and can give us some facts about the differences, rather
> than, frankly, FUD.

The best way would be to contact the author, Matthew Barnett, or to ask
on the tracker on http://bugs.python.org/issue2636. He has been quite
willing to answer such questions in the past, AFAIR.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Guido van Rossum
On Fri, Aug 26, 2011 at 3:33 PM, Antoine Pitrou <[hidden email]> wrote:

> On Fri, 26 Aug 2011 15:18:35 -0700
> Guido van Rossum <[hidden email]> wrote:
>>
>> I can't say I liked how that transition was handled last time around.
>> I really don't want to have to tell people "Oh, that bug is fixed but
>> you have to use regex instead of re" and then a few years later have
>> to tell them "Oh, we're deprecating regex, you should just use re".
>>
>> I'm really hoping someone has more actual technical understanding of
>> re vs. regex and can give us some facts about the differences, rather
>> than, frankly, FUD.
>
> The best way would be to contact the author, Matthew Barnett,

I had added him to the beginning of this thread but someone took him off.

> or to ask
> on the tracker on http://bugs.python.org/issue2636. He has been quite
> willing to answer such questions in the past, AFAIR.

So, that issue is about something called "regexp". AFAIK Matthew
(MRAB) wrote something called "regex"
(http://pypi.python.org/pypi/regex). Are they two different things???

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Dan Stromberg-2
In reply to this post by Guido van Rossum

On Fri, Aug 26, 2011 at 2:45 PM, Guido van Rossum <[hidden email]> wrote:
...but on second thought I wonder if maybe regex is
mature enough to replace re in Python 3.3.

I agree that the move from regex to re was kind of painful.

It seems someone should merge the unit tests for re and regex, and apply the merged result to each for the sake of comparison.  There might also be a need to expand the merged result to include new things.

Then there probably should be a from __future__ import for a while.
 

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

"Martin v. Löwis"
In reply to this post by Guido van Rossum
> However, I don't know much about regex

The problem really is: nobody does (except for Matthew Barnett
probably). This means that this contribution might be stuck
"forever": somebody would have to review the module, identify
issues, approve it, and take the blame if something breaks.
That takes considerable time and has a considerable risk, for
little expected glory - so nobody has volunteered to
mentor/manage integration of that code.

I believe most core contributors (who have run into this code)
consider it worthwhile, but are just too scared to take action.

Among us, some are more "regex gurus" than others; you know
who you are. I guess the PSF would pay for the review, if that
is what it would take.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Guido van Rossum
On Fri, Aug 26, 2011 at 3:54 PM, "Martin v. Löwis" <[hidden email]> wrote:

>> However, I don't know much about regex
>
> The problem really is: nobody does (except for Matthew Barnett
> probably). This means that this contribution might be stuck
> "forever": somebody would have to review the module, identify
> issues, approve it, and take the blame if something breaks.
> That takes considerable time and has a considerable risk, for
> little expected glory - so nobody has volunteered to
> mentor/manage integration of that code.
>
> I believe most core contributors (who have run into this code)
> consider it worthwhile, but are just too scared to take action.
>
> Among us, some are more "regex gurus" than others; you know
> who you are. I guess the PSF would pay for the review, if that
> is what it would take.

Makes sense. I noticed Ezio seems quite in favor of regex. Maybe he knows more?

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

M.-A. Lemburg
In reply to this post by Guido van Rossum
Guido van Rossum wrote:

> On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg <[hidden email]> wrote:
>> Guido van Rossum wrote:
>>> I just made a pass of all the Unicode-related bugs filed by Tom
>>> Christiansen, and found that in several, the response was "this is
>>> fixed in the regex module [by Matthew Barnett]". I started replying
>>> that I thought that we should fix the bugs in the re module (i.e.,
>>> really in _sre.c) but on second thought I wonder if maybe regex is
>>> mature enough to replace re in Python 3.3. It would mean that we won't
>>> fix any of these bugs in earlier Python versions, but I could live
>>> with that.
>>>
>>> However, I don't know much about regex -- how compatible is it, how
>>> fast is it (including extreme cases where the backtracking goes
>>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>>> to actually incorporate it into CPython as a complete drop-in
>>> replacement of the re package (such that nobody needs to change their
>>> imports or the flags they pass to the re module).
>>>
>>> We'd also probably have to train some core developers to be familiar
>>> enough with the code to maintain and evolve it -- I assume we can't
>>> just volunteer Matthew to do so forever... :-)
>>>
>>> What's the alternative? Is adding the requested bug fixes and new
>>> features to _sre.c really that hard?
>>
>> Why not simply add the new lib, see whether it works out and
>> then decide which path to follow.
>>
>> We've done that with the old regex lib. It took a few years
>> and releases to have people port their applications to the
>> then new re module and syntax, but in the end it worked.
>>
>> With a new regex library there are likely going to be quite
>> a few subtle differences between re and regex - even if it's
>> just doing things in a more Unicode compatible way.
>>
>> I don't think anyone can actually list all the differences given
>> the complex nature of regular expressions, so people will
>> likely need a few years and releases to get used it before
>> a switch can be made.
>
> I can't say I liked how that transition was handled last time around.
> I really don't want to have to tell people "Oh, that bug is fixed but
> you have to use regex instead of re" and then a few years later have
> to tell them "Oh, we're deprecating regex, you should just use re".

No, you tell them: "If you want Unicode 6 semantics, use regex,
if you're fine with Unicode 2.0/3.0 semantics, use re". After all,
it's not like re suddenly stopped working :-)

> I'm really hoping someone has more actual technical understanding of
> re vs. regex and can give us some facts about the differences, rather
> than, frankly, FUD.

The good part is that it's based on the re code, the FUD comes
from the fact that the new lib is 380kB larger than the old one
and that's not even counting the generated 500kB of lookup
tables.

If no one steps up to do a review or analysis, I think the
only practical way to test the lib is to give it a prominent
chance to prove itself.

The other aspect is maintenance.

Perhaps we could have a summer of code student do a review and
analysis to get familiar with the code and then have at least
two developers know the code well enough to support it for
a while.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                38 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Tom Christiansen-2
"M.-A. Lemburg" <[hidden email]> wrote
   on Sat, 27 Aug 2011 01:00:31 +0200:

> The good part is that it's based on the re code, the FUD comes
> from the fact that the new lib is 380kB larger than the old one
> and that's not even counting the generated 500kB of lookup
> tables.

Well, you have to put the property tables somewhere, somehow.
There are various schemes for demand loading them as needed,
but I don't know whether those are used.

--tom
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

MRAB-2
On 27/08/2011 00:08, Tom Christiansen wrote:

> "M.-A. Lemburg"<[hidden email]>  wrote
>     on Sat, 27 Aug 2011 01:00:31 +0200:
>
>> The good part is that it's based on the re code, the FUD comes
>> from the fact that the new lib is 380kB larger than the old one
>> and that's not even counting the generated 500kB of lookup
>> tables.
>
> Well, you have to put the property tables somewhere, somehow.
> There are various schemes for demand loading them as needed,
> but I don't know whether those are used.
>
FYI, the .pyd for Python v3.2 is 227KB, about half of which is property
tables.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Guido van Rossum
On Fri, Aug 26, 2011 at 4:21 PM, MRAB <[hidden email]> wrote:

> On 27/08/2011 00:08, Tom Christiansen wrote:
>>
>> "M.-A. Lemburg"<[hidden email]>  wrote
>>    on Sat, 27 Aug 2011 01:00:31 +0200:
>>
>>> The good part is that it's based on the re code, the FUD comes
>>> from the fact that the new lib is 380kB larger than the old one
>>> and that's not even counting the generated 500kB of lookup
>>> tables.
>>
>> Well, you have to put the property tables somewhere, somehow.
>> There are various schemes for demand loading them as needed,
>> but I don't know whether those are used.
>>
> FYI, the .pyd for Python v3.2 is 227KB, about half of which is property
> tables.

I wouldn't hold the size of the generated tables against you. :-)

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Antoine Pitrou
In reply to this post by Guido van Rossum
On Fri, 26 Aug 2011 15:47:21 -0700
Guido van Rossum <[hidden email]> wrote:

> > The best way would be to contact the author, Matthew Barnett,
>
> I had added him to the beginning of this thread but someone took him off.
>
> > or to ask
> > on the tracker on http://bugs.python.org/issue2636. He has been quite
> > willing to answer such questions in the past, AFAIR.
>
> So, that issue is about something called "regexp". AFAIK Matthew
> (MRAB) wrote something called "regex"
> (http://pypi.python.org/pypi/regex). Are they two different things???

No, it's the same.  The source is at
https://code.google.com/p/mrab-regex-hg/, btw.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Antoine Pitrou
In reply to this post by M.-A. Lemburg
On Sat, 27 Aug 2011 01:00:31 +0200
"M.-A. Lemburg" <[hidden email]> wrote:
> >
> > I can't say I liked how that transition was handled last time around.
> > I really don't want to have to tell people "Oh, that bug is fixed but
> > you have to use regex instead of re" and then a few years later have
> > to tell them "Oh, we're deprecating regex, you should just use re".
>
> No, you tell them: "If you want Unicode 6 semantics, use regex,
> if you're fine with Unicode 2.0/3.0 semantics, use re". After all,
> it's not like re suddenly stopped working :-)

It has a whole lot of new features in addition to better unicode
support. See for yourself:
https://code.google.com/p/mrab-regex-hg/wiki/GeneralDetails

> Perhaps we could have a summer of code student do a review and
> analysis to get familiar with the code and then have at least
> two developers know the code well enough to support it for
> a while.

I'm not sure a GSoC student would be the best candidate to do a review
matching our expectations.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Antoine Pitrou
In reply to this post by Dan Stromberg-2
On Fri, 26 Aug 2011 15:48:42 -0700
Dan Stromberg <[hidden email]> wrote:
>
> Then there probably should be a from __future__ import for a while.

If you are willing to use a "from __future__ import", why not simply

    import regex as re

? We're not Perl, we don't have built-in syntactic support for regular
expressions.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Dan Stromberg-2

On Fri, Aug 26, 2011 at 5:08 PM, Antoine Pitrou <[hidden email]> wrote:
On Fri, 26 Aug 2011 15:48:42 -0700
Dan Stromberg <[hidden email]> wrote:
>
> Then there probably should be a from __future__ import for a while.

If you are willing to use a "from __future__ import", why not simply

   import regex as re

? We're not Perl, we don't have built-in syntactic support for regular
expressions.

Regards

If you add regex as "import regex", and the new regex module doesn't work out, regex might be harder to get rid of.  from __future__ import is an established way of trying something for a while to see if it's going to work.

EG: "from __future__ import re", where re is really the new module.

But whatever.


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Ben Finney-10
In reply to this post by M.-A. Lemburg
"M.-A. Lemburg" <[hidden email]> writes:

> Guido van Rossum wrote:

> > I really don't want to have to tell people "Oh, that bug is fixed
> > but you have to use regex instead of re" and then a few years later
> > have to tell them "Oh, we're deprecating regex, you should just use
> > re".
>
> No, you tell them: "If you want Unicode 6 semantics, use regex, if
> you're fine with Unicode 2.0/3.0 semantics, use re".

What do we say, then, to those who are unaware of the different
semantics between those versions of Unicode, and want regular expression
to “just work” in Python?

To which document can we direct them to understand what semantics they
want?

> After all, it's not like re suddenly stopped working :-)

For some value of “working”, that is. The trick is to know whether that
value is what one wants.

--
 \        “The fact of your own existence is the most astonishing fact |
  `\    you'll ever have to confront. Don't dare ever see your life as |
_o__)    boring, monotonous, or joyless.” —Richard Dawkins, 2010-03-10 |
Ben Finney

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Ezio Melotti
In reply to this post by Guido van Rossum


On Sat, Aug 27, 2011 at 1:57 AM, Guido van Rossum <[hidden email]> wrote:
On Fri, Aug 26, 2011 at 3:54 PM, "Martin v. Löwis" <[hidden email]> wrote:
> [...]
> Among us, some are more "regex gurus" than others; you know
> who you are. I guess the PSF would pay for the review, if that
> is what it would take.

Makes sense. I noticed Ezio seems quite in favor of regex. Maybe he knows more?

Matthew has always been responsive on the tracker, usually fixing reported bugs in a matter of days, and I think he's willing to keep doing so once the regex module is included.  Even if I haven't yet tried the module myself (I'm planning to do it though), it seems quite popular out there (the download number on PyPI apparently gets reset for each new release, so I don't know the exact total), and apparently people are already using it as a replacement of re.

I'm not sure it's worth doing an extensive review of the code, a better approach might be to require extensive test coverage  (and a review of tests).  If the code seems well written, commented, documented (I think proper rst documentation is still missing), and tested (both with unittest and out in the wild), and Matthew is willing to maintain it, I think we can include it.  We will get familiar with the code once we start contributing to it and fixing bugs, as it already happens with most of the other modules.

See also the "New regex module for 3.2?" thread ( http://mail.python.org/pipermail/python-dev/2010-July/101606.html ).

Best Regards,
Ezio Melotti
 

--
--Guido van Rossum (python.org/~guido)


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Steven D'Aprano-8
In reply to this post by Ben Finney-10
Ben Finney wrote:
> "M.-A. Lemburg" <[hidden email]> writes:

>> No, you tell them: "If you want Unicode 6 semantics, use regex, if
>> you're fine with Unicode 2.0/3.0 semantics, use re".
>
> What do we say, then, to those who are unaware of the different
> semantics between those versions of Unicode, and want regular expression
> to “just work” in Python?
>
> To which document can we direct them to understand what semantics they
> want?

Presumably, like all modules, both the re and the regex module will have
their own individual pages in the library reference. As the newcomer,
regex should include a discussion of differences between the two. This
can then be quietly dropped once re becomes formally deprecated.

(Assuming that the std lib keeps re and regex in parallel for a few
releases, which is not a given.)

However, I note that last time, the old regex module was just documented
as obsolete with little detailed discussion of the differences:

http://docs.python.org/release/1.5/lib/node69.html#SECTION005300000000000000000


--
Steven
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Should we move to replace re with regex?

Antoine Pitrou
In reply to this post by Ezio Melotti
On Sat, 27 Aug 2011 04:37:21 +0300
Ezio Melotti <[hidden email]> wrote:
>
> I'm not sure it's worth doing an extensive review of the code, a better
> approach might be to require extensive test coverage  (and a review of
> tests).  If the code seems well written, commented, documented (I think
> proper rst documentation is still missing),

Isn't this precisely what a review is supposed to assess?

> We will get familiar with the code once we start contributing
> to it and fixing bugs, as it already happens with most of the other modules.

I'm not sure it's a good idea for a module with more than 10000 lines
of C code (and 4000 lines of pure Python code). This is several times
the size of multiprocessing. The C code looks very cleanly written, but
it's still a big chunk of algorithmically sophisticated code.

Another "interesting" question is whether it's easy to port to the PEP
393 string representation, if it gets accepted.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40nabble.com
123