|
Hi,
I noticed that there is a PEP (3154) and a GSoC proposal about improving Pickle. Given the recent discussion on this list about using Cython for the import module, I wonder if it wouldn't make even more sense to switch from a C (accelerator) implementation to Cython for _pickle. The rationale is that C code that deals a lot with object operations tends to be rather verbose, and _pickle specifically looks very verbose in many places. Some of this is optimised I/O, ok, but most of it seems to take its complexity from code specialisations for builtin types and a lot of error handling code. A Cython reimplementation would take a lot of weight out of this. Note that the approach won't be as simple as compiling pickle.py. _pickle uses a lot of optimisations that only work at the C level, at least efficiently. So the idea would be to rewrite _pickle in Cython instead. It's currently about 6500 lines of C. Even if we divide that only by a rather conservative factor of 3, we'd end up with some 2000 lines of Cython code, all extracted straight from the existing C code. That sounds like less than two weeks of work, maybe even if we add the marshal module to it. In less than a month of GSoC time, this could easily reach a point where it's "close to the speed of what we have" and "fast enough", but a lot more accessible and maintainable, thus also making it easier to add the extensions described in the PEP. What do you think? Stefan _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
> What do you think?
I think I know what Jim Fulton thinks (as we talked about something like this a PyCon): don't. He is already sad that cPickle grew so much pickle features when it was designed as a real fast implementation. pickle speed is really important to some users, and any loss of performance needs serious justification. Easier maintenance is not a sufficient reason. Regards, Martin _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Stefan Behnel-3
On Thu, Apr 19, 2012 at 6:55 PM, Stefan Behnel <[hidden email]> wrote:
> What do you think? I think the possible use of Cython for standard library extension modules is potentially worth looking into for the 3.4 timeframe (c.f. the recent multiple checkins sorting out the refcounts for the new ImportError helper function). There are obviously a lot of factors to consider before actually proceeding with such an approach (even for the extension modules), but a side-by-side comparison of pickle.py, the existing C accelerated pickle module and a Cython accelerated pickle module (including benchmark numbers) would be a valuable data point in any such discussion. However, it would definitely have to be pitched to any interested students as a proof-of-concept exercise, with a real possibility that the outcome will end up supporting MvL's reply. Regards, Nick. -- Nick Coghlan | [hidden email] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Stefan Behnel-3
On Thu, 19 Apr 2012 10:55:24 +0200
Stefan Behnel <[hidden email]> wrote: > > I noticed that there is a PEP (3154) and a GSoC proposal about improving > Pickle. Given the recent discussion on this list about using Cython for the > import module, I wonder if it wouldn't make even more sense to switch from > a C (accelerator) implementation to Cython for _pickle. I think that's quite orthogonal to PEP 3154 (which shouldn't add a lot of new code IMHO). > Note that the approach won't be as simple as compiling pickle.py. _pickle > uses a lot of optimisations that only work at the C level, at least > efficiently. So the idea would be to rewrite _pickle in Cython instead. > It's currently about 6500 lines of C. Even if we divide that only by a > rather conservative factor of 3, we'd end up with some 2000 lines of Cython > code, all extracted straight from the existing C code. That sounds like > less than two weeks of work, maybe even if we add the marshal module to it. I think this all needs someone to demonstrate the benefits, in terms of both readability/maintainability, and performance. Also, while C is a low-level language, Cython is a different language than Python when you start using its optimization features. This means core developers have to learn that language. Regards Antoine. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Nick Coghlan
On Thu, Apr 19, 2012 at 05:38, Nick Coghlan <[hidden email]> wrote:
> On Thu, Apr 19, 2012 at 6:55 PM, Stefan Behnel <[hidden email]> wrote: >> What do you think? > > I think the possible use of Cython for standard library extension > modules is potentially worth looking into for the 3.4 timeframe (c.f. > the recent multiple checkins sorting out the refcounts for the new > ImportError helper function). I'd rather just "rtfm" as was suggested and get it right than switch everything around to Cython. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Antoine Pitrou
On Thu, 19 Apr 2012 14:44:06 +0200, Antoine Pitrou <[hidden email]> wrote:
> Also, while C is a low-level language, Cython is a different language > than Python when you start using its optimization features. This means > core developers have to learn that language. Hmm. On the other hand, perhaps some core developers (present or future) would prefer to learn Cython over learning C [*]. --David [*] For this you may actually want to read "learning to modify the Python C codebase", since in fact I know how to program in C, I just prefer to do as little of it as possible, and so haven't really learned the Python C codebase. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
Personally I find the unholy product of C and Python that is Cython to be more complex than the sum of the complexities of its parts. Is it really wise to be learning Cython without already knowing C, Python, and the CPython object model? While code generation alleviates the burden of tedious languages, it's also infinitely more complex, makes debugging very difficult and adds to prerequisite knowledge, among other drawbacks. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
Matt Joiner, 19.04.2012 16:13:
> Personally I find the unholy product of C and Python that is Cython to be > more complex than the sum of the complexities of its parts. Is it really > wise to be learning Cython without already knowing C, Python, and the > CPython object model? The main obstacle that I regularly see for users of the C-API is actually reference counting and an understanding of what borrowed references and owned references imply in a given code context. In fact, I can't remember seeing any C extension code getting posted on Python mailing lists (core developers excluded) that has no ref-counting bugs or at least a severe lack of error handling. Usually, such code is also accompanied by a comment that the author is not sure if everything is correct and asks for advice, and that's rather independent of the functional complexity of the code snippet. OTOH, I've also seen a couple of really dangerous code snippets already that posters apparently meant to show off with, so not everyone is aware of these obstacles. Also, the C code by inexperienced programmers tends to be fairly inefficient because they simply do not know what impact some convenience functions have. So they tend to optimise prematurely in places where they feel more comfortable, but that can never make up for the overhead that simple and very conveniently looking C-API functions introduce in other places. Value packing comes to mind. So, from my experience, there is a serious learning curve beyond knowing C, right from the start when trying to work on C extensions, including CPython's own code, because the C-API is far from trivial. And that's the kind of learning curve that Cython tries to lower. It makes it substantially easier to write correct code, simply by letting you write Python code instead of C plus C-API code. And once it works, you can start making it explicitly faster by applying "I know what I'm doing" schemes to proven hot spots or by partially rewriting it. And if you do not know yet what you're doing, then *that's* where the learning curve begins. But by then, your code is basically written, works more or less and can be benchmarked. > While code generation alleviates the burden of tedious languages, it's also > infinitely more complex, makes debugging very difficult and adds to > prerequisite knowledge, among other drawbacks. You can use gdb for source level debugging of Cython code and cProfile to profile it. Try that with C-API code. Stefan _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
On Thu, Apr 19, 2012 at 16:08, Stefan Behnel
>> While code generation alleviates the burden of tedious languages, it's also >> infinitely more complex, makes debugging very difficult and adds to >> prerequisite knowledge, among other drawbacks. > > You can use gdb for source level debugging of Cython code and cProfile to > profile it. Try that with C-API code. I know I'm in the minority of committers being on Windows, but we do receive a good amount of reports and contributions from Windows users who dive into the C code. The outside contributors actually gave the strongest indication that we needed to move to VS2010. Visual Studio by itself makes debugging unbelievably easy, and with the Python Tools for VS plugin it even allows Visual Studio's built-in profiler to work. I know Windows is not on most people's maps, but if we have to scrap the debugger, that's another learning curve attachment to evaluate. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
Brian Curtin, 19.04.2012 23:19:
> On Thu, Apr 19, 2012 at 16:08, Stefan Behnel >>> While code generation alleviates the burden of tedious languages, it's also >>> infinitely more complex, makes debugging very difficult and adds to >>> prerequisite knowledge, among other drawbacks. >> >> You can use gdb for source level debugging of Cython code and cProfile to >> profile it. Try that with C-API code. > > I know I'm in the minority of committers being on Windows, but we do > receive a good amount of reports and contributions from Windows users > who dive into the C code. Doesn't match my experience at all - different software target audiences, I guess. > Visual Studio by itself makes debugging unbelievably easy, and with > the Python Tools for VS plugin it even allows Visual Studio's built-in > profiler to work. I know Windows is not on most people's maps, but if > we have to scrap the debugger, that's another learning curve > attachment to evaluate. What I meant was that there's pdb for debugging Python code (which doesn't know about the C code it executes) and gdb (or VS) for debugging C code, from which you can barely infer the Python code it executes. For Cython code, you can use gdb for both Cython and C, and within limits also for Python code. Here's a quick intro to see what I mean: http://docs.cython.org/src/userguide/debugging.html For profiling, you can use cProfile for Python code (which doesn't tell you about the C code it executes) and oprofile, callgrind, etc. (incl. VS) for C code, from which it's non-trivial to infer the relation to the Python code. With Cython, you can use cProfile for both Cython and Python code as long as you stay at the source code level, and only need to descend to a low-level profiler when you care about the exact details, usually assembly jumps and branches. Anyway, I guess this is getting off-topic for this list. Stefan _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
On Thu, Apr 19, 2012 at 17:21, Stefan Behnel <[hidden email]> wrote:
> Brian Curtin, 19.04.2012 23:19: >> On Thu, Apr 19, 2012 at 16:08, Stefan Behnel >>>> While code generation alleviates the burden of tedious languages, it's also >>>> infinitely more complex, makes debugging very difficult and adds to >>>> prerequisite knowledge, among other drawbacks. >>> >>> You can use gdb for source level debugging of Cython code and cProfile to >>> profile it. Try that with C-API code. >> >> I know I'm in the minority of committers being on Windows, but we do >> receive a good amount of reports and contributions from Windows users >> who dive into the C code. > > Doesn't match my experience at all - different software target audiences, I > guess. I'm don't know what this means. I work on CPython, which is the target audience at hand, and I come across reports and contributions from Windows users for C extensions. >> Visual Studio by itself makes debugging unbelievably easy, and with >> the Python Tools for VS plugin it even allows Visual Studio's built-in >> profiler to work. I know Windows is not on most people's maps, but if >> we have to scrap the debugger, that's another learning curve >> attachment to evaluate. > > What I meant was that there's pdb for debugging Python code (which doesn't > know about the C code it executes) and gdb (or VS) for debugging C code, > from which you can barely infer the Python code it executes. For Cython > code, you can use gdb for both Cython and C, and within limits also for > Python code. Here's a quick intro to see what I mean: > > http://docs.cython.org/src/userguide/debugging.html I know what you meant. What I meant is "easy debugging on Windows goes away, now I have to setup and learn GDB on Windows". *I* can do that. Does the rest of the community want to have to do that as well? We should also take into consideration how something like this affects the third-party IDEs and their debugger support. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Stefan Behnel-3
On Thu, Apr 19, 2012 at 4:55 AM, Stefan Behnel <[hidden email]> wrote:
That sounds like less than two weeks of work, maybe even if we add the marshal module to it. As others have pointed out, many users of pickle depend on its performance. The main reason why _pickle.c is so big is all the low-level optimizations we have in there. We have custom stack and dictionary implementations just for the sake of speed. We also have fast paths for I/O operations and function calls. These optimizations alone are taking easily 2000 lines of code and they are not micro-optimizations. Each of these were shown to give speedups from one to several orders of magnitude.
So I disagree that we could easily reach the point where it's "close to the speed of what we have." And if we were to attempt this, it would be a multiple months undertaking. I would rather see that time spent on improving pickle than on yet another reimplementation.
-- Alexandre _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
> So I disagree that we could easily reach the point where it's "close to the
> speed of what we have." And if we were to attempt this, it would be a > multiple months undertaking. I would rather see that time spent on > improving pickle than on yet another reimplementation. Of course, this being free software, anybody can spend time on whatever they please, and this should not make anybody feel sad. You just don't get merits if you work on stuff that nobody cares about. Regards, Martin _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Alexandre Vassalotti-4
Alexandre Vassalotti wrote:
> > We have custom stack and > dictionary implementations just for the sake of speed. We also have fast > paths for I/O operations and function calls. All of that could very likely be carried over almost unchanged into a Cython version. I don't see why it should take multiple months. It's not a matter of rewriting it from scratch, just translating it from one dialect (C) to another (the C subset of Cython). -- Greg _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by "Martin v. Löwis"
On Sun, Apr 22, 2012 at 6:12 PM, <[hidden email]> wrote:
Yes, of course. I don't want to discourage anyone to investigate this option—in fact, I would very much like to see myself proven wrong. But, if I understood Stefan correctly, he is proposing to have a GSoC student to do the work, to which I would feel uneasy about since we have no idea how valuable this would be as a contribution.
-- Alexandre _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
On Mon, Apr 23, 2012 at 9:27 AM, Alexandre Vassalotti
<[hidden email]> wrote: > On Sun, Apr 22, 2012 at 6:12 PM, <[hidden email]> wrote: >> Of course, this being free software, anybody can spend time on whatever >> they >> please, and this should not make anybody feel sad. You just don't get >> merits >> if you work on stuff that nobody cares about. > > > Yes, of course. I don't want to discourage anyone to investigate this > option—in fact, I would very much like to see myself proven wrong. But, if I > understood Stefan correctly, he is proposing to have a GSoC student to do > the work, to which I would feel uneasy about since we have no idea how > valuable this would be as a contribution. So long as it's made clear to the students applying that it's a proof of concept that may return a negative result (i.e. "it was tried, it proved to be a bad idea") I don't see a problem with it. The freedom to try out multiple ideas in parallel is one of the great strengths of open source. We've had GSoC students try unsuccessful experiments in the past and have gained useful information as a result (e.g. the main reason I know the Import Engine API proposed in the deferred PEP 406 isn't adequate as currently written is because of the design level problems Greg found when implementing it last summer. The currently documented design simply doesn't achieve the full objectives of the PEP) Cheers, Nick. -- Nick Coghlan | [hidden email] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
On Sun, Apr 22, 2012 at 6:34 PM, Nick Coghlan <[hidden email]> wrote:
> On Mon, Apr 23, 2012 at 9:27 AM, Alexandre Vassalotti > <[hidden email]> wrote: >> On Sun, Apr 22, 2012 at 6:12 PM, <[hidden email]> wrote: >>> Of course, this being free software, anybody can spend time on whatever >>> they >>> please, and this should not make anybody feel sad. You just don't get >>> merits >>> if you work on stuff that nobody cares about. >> >> >> Yes, of course. I don't want to discourage anyone to investigate this >> option—in fact, I would very much like to see myself proven wrong. But, if I >> understood Stefan correctly, he is proposing to have a GSoC student to do >> the work, to which I would feel uneasy about since we have no idea how >> valuable this would be as a contribution. > > So long as it's made clear to the students applying that it's a proof > of concept that may return a negative result (i.e. "it was tried, it > proved to be a bad idea") I don't see a problem with it. The freedom > to try out multiple ideas in parallel is one of the great strengths of > open source. > > We've had GSoC students try unsuccessful experiments in the past and > have gained useful information as a result (e.g. the main reason I > know the Import Engine API proposed in the deferred PEP 406 isn't > adequate as currently written is because of the design level problems > Greg found when implementing it last summer. The currently documented > design simply doesn't achieve the full objectives of the PEP) However, I think that in this case the success may be predetermined, or at least not determined by technical success alone. I have a lot of respect for Cython, but I don't think it is right to have any part of core Python depend on it. Cython is an incredibly complex and relatively young (and still fast evolving) piece of technology, while I think that core dependencies should be minimized and limited to absolutely fundamental building blocks. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
| Powered by Nabble | Edit this page |
