Buffer protocol - direct vs. JVM ByteBuffers

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Buffer protocol - direct vs. JVM ByteBuffers

Jeff Allen-2

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Buffer protocol - direct vs. JVM ByteBuffers

Stefan Richthofer
Hey Jeff,
 
A NIO ByteBuffer being direct makes the difference whether you can access it from native code via JNI:
It provides access to the data in a direct NIO ByteBuffer via a C-style pointer. This means a native library can read or write the very same memory the buffer exposes on Java-side.
So the support for the DIRECT_NIO flag would open up the whole world of JNI/C-use cases for Jython's buffer protocol.
 
See
 
http://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html#nio_support
 
for official documentation on this feature (i.e. related JNI-API).

My concrete use-case is that I can make JyNI put some sugar around this API such that it resembles CPython's buffer protocol, ultimately allowing to hand a Jython-style buffer protocol supporting PyObject from Jython to a CPython-style native C-extension which will recognize it as a CPython-style buffer protocol supporting object.

Honestly, direct NIO ByteBuffer is not the only solution to this. JNI can also offer a C-style memory pointer to a byte-array's data (under certain conditions). However this has some drawbacks, but might be suitable as a fallback to some extend.
 
See
 
http://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html#wp17314
(scroll down to "Get<PrimitiveType>ArrayElements Routines")
 
https://rkennke.wordpress.com/2007/07/28/efficient-jni-programming-iii-array-access
 
Main issue here is that GetByteArrayElements does not guarantee to offer a view onto the data (i.e. if array pinning is not possible for whatever reason), but might only provide a copy. While modified data can be copied back, this would not be sufficient to support CPython's PyObject_GetBuffer method with the PyBUF_WRITABLE flag set.
As a fallback I can (and maybe will) apply a copy-back operation on PyBuffer_Release call. This might work for some extensions, but I suppose the semantics of PyBUF_WRITABLE flag does not intend to have the write-operation only realized not before PyBuffer_Release call (CPython doc does not state this semantics explicitly though).
 
To make the JVM try hard to provide the underlying array without copying I can use GetPrimitiveArrayCritical, but let me cite from its doc:
 
"""
However, there are significant restrictions on how these functions can be used.
After calling GetPrimitiveArrayCritical, the native code should not run for an extended period of time before it calls ReleasePrimitiveArrayCritical. We must treat the code inside this pair of functions as running in a "critical region." Inside a critical region, native code must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread. (For example, the current thread must not call read on a stream being written by another Java thread.)
These restrictions make it more likely that the native code will obtain an uncopied version of the array, even if the VM does not support pinning. For example, a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical.
"""
 
Remember that the pointer would be passed to a native C-extension and that the extension's call to PyBuffer_Release would let JyNI trigger ReleasePrimitiveArrayCritical. Do I need to say that we must assume the native extension was designed to work with CPython and doesn't know or care about running in a "critical region" with the implied restrictions? So I think this fallback should only be applied to certain harmless extensions previously checked and white-listed, if at all.
 
As I currently perceive it, direct NIO ByteBuffer is the only save and sufficient way for a full native BufferProtocol support.
 
-Stefan
 
 
Gesendet: Sonntag, 15. Mai 2016 um 11:57 Uhr
Von: "Jeff Allen" <[hidden email]>
An: "Stefan Richthofer" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen
------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Buffer protocol - direct vs. JVM ByteBuffers

Stefan Richthofer
In reply to this post by Jeff Allen-2
Just an add-on to my recent post:
 
>It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.
 
I think it is likely the JVM does not offer a backing array, if the buffer is created as direct (i.e. these flags likely exclude each other), because this would imply array pinning and all the restrictions coming with it. I didn't test it though, but anyway we cannot rely on the one or other behavior, as doc explicitly does not guarantee a backing array for direct buffers, saying this is "implementation specific".
 
 
Gesendet: Sonntag, 15. Mai 2016 um 11:57 Uhr
Von: "Jeff Allen" <[hidden email]>
An: "Stefan Richthofer" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen
------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Buffer protocol - direct vs. JVM ByteBuffers

Jeff Allen-2

Thanks for that. So, the (possible) copy-back semantics of Get<PrimitiveType>ArrayElements effectively make a direct buffer copy of the contents. If the C-code runs in the thread of the Java execution that calls it (or equivalently it is suspended) then to first order the copy causes no problem. But it is not difficult to cook up a scenario where another thread or call-back into Java sees a different state from C.

Alternatively, one uses the "critical" methods, and suffers restrictions that are, I expect, unenforcible on arbitrary CPython extension modules, such as being short and not yielding the CPU.

I'm reminded of the relationship in CPython between C-code and interpreted code, where the GIL must be held, proving all other threads are "restubg" between instructions, and a context switch is only allowed when surrounded by the appropriate magical incantations. I think the Universe is trying to tell us something.

The problem I see with the DIRECT_NIO flag is that one cannot expect to choose, at the point of getting a PyBuffer, whether that buffer should be direct or heap. The data that hold the state of an object have a certain implementation in Java, and so the buffer will be a heap buffer. Or one can imagine a PyObject whose state is always in a direct ByteBuffer (representing an image mapped from disk, say) and then the PyBuffer would always be direct. Just possibly objects whose main purpose is to be native-friendly would have that implementation. Just possibly, this is a thing you get to choose when the object is constructed.

The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements , except that I suppose the object knows it has done it and can handle intervening access via the Java API ... effectively a change of implementation on the fly.

Jeff Allen
On 15/05/2016 16:39, Stefan Richthofer wrote:
Just an add-on to my recent post:
 
>It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.
 
I think it is likely the JVM does not offer a backing array, if the buffer is created as direct (i.e. these flags likely exclude each other), because this would imply array pinning and all the restrictions coming with it. I didn't test it though, but anyway we cannot rely on the one or other behavior, as doc explicitly does not guarantee a backing array for direct buffers, saying this is "implementation specific".
 
 
Gesendet: Sonntag, 15. Mai 2016 um 11:57 Uhr
Von: "Jeff Allen" [hidden email]
An: "Stefan Richthofer" [hidden email], "Jython Developers" [hidden email]
Betreff: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen
------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev


------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Buffer protocol - direct vs. JVM ByteBuffers

Stefan Richthofer
>So, the (possible) copy-back semantics of Get<PrimitiveType>ArrayElements effectively make a direct buffer copy of the contents
 
Except that a direct buffer instantanously reflects modifications on Java-side
 
>If the C-code runs in the thread of the Java execution that calls it (or equivalently it is suspended) then to first order the copy causes no problem.
 
Good point: A single threaded evironment would (usually) not notice the difference between direct or copy-back semantics. Given that JyNI holds a native GIL and native extensions are usually designed in context of a GIL'ed environment chances are good that we would get away with the copy-back fallback for writable PyBuffer in the majority of cases.

Still, note that also even a single threaded setup could break this:
The thread could create two (or more) PyBuffer-views of the same object and hand both to various functions that read and write on them without calling release (and thus trigger copy-back) inbetween. The extension would expect if view 'A' was modified, view 'B' already reflects this modification when passed to another function. One could argue this is an insane design, but I actually cannot assess what good reasons some programmer might have for this.

> The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. [...] effectively a change of implementation on the fly.
 
This is what I have in mind. Changing the backend on the fly shouldn't be much more expensive than creating a copy, but would then save the cost of copy-back and entire copy cost for future calls. Using the bulk-set method of ByteBuffer this should be easy and efficient to do (bulk-get to convert the other direction). The only infeasible situation would be if an AS_DIRECT_NIO buffer was requested while an array-backed buffer is exported and not yet released or vise versa. In this case the request should just fail. I suppose for sake of debugging we should add a verbose mode/flag that makes Jython print out (or append it to the error message in bufferErrorFromSyndrome) the exact reason why some buffer-request failed so a user is able to identify the design flaw.
 
> This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements, except that I suppose the object knows it has done it and can handle intervening access via the Java API
 
Since actual views to the same memory would be shared with (native) extensions unlike Get<PrimitiveType>ArrayElements (in copy-case/no array pinning) this would not break the case described above, where multiple PyBuffer views to the same object are in the game.
 
I'm not sure what you mean by "full circle to the behaviour of Get<PrimitiveType>ArrayElements", so let me comprehend how I assume the fallback (in BaseBuffer etc) for non-array-backed ByteBuffer backend would take place:
(gave it some thoughts recently)
 
- use bulk-get to copy buffer content to a temporary array (maybe a reused one, but that's an optimization; also it might often be sufficient to copy a limited subsection of the buffer, which is another optimization)
- perform on the temporary array just like one would usually do on the backing array
- use bulk-set to copy-back the temporary array to the buffer
- these operations should be glued together by a synchronized block (on the ByteBuffer backend) in order to appear like an atomical operation
(this could actually still interfere with native buffer-access in multi-threaded case, but multiple threads writing to the same buffer would not end-up sanely anyway. (ByteBuffer does not prihibit this (e.g. by ConcurrentModificationException) does it?)
(obviously the bulk-get can be skipped for write-only access and the bulk-set can be skipped for read-only access etc)

This would be somehow "full circle to the behaviour of Get<PrimitiveType>ArrayElements" but would move the copy-back issue to Java-side, which is good: On Java-side we have much better control of copy-back timing and corresponding thread synchronization, unlike inside a foreign C-extension. Also, Java-code would be aware of this semantics or use the AS_ARRAY flag, both of which would be fine.
 
 
 
Gesendet: Montag, 16. Mai 2016 um 22:04 Uhr
Von: "Jeff Allen" <[hidden email]>
An: "Stefan Richthofer" <[hidden email]>
Cc: "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Thanks for that. So, the (possible) copy-back semantics of Get<PrimitiveType>ArrayElements effectively make a direct buffer copy of the contents. If the C-code runs in the thread of the Java execution that calls it (or equivalently it is suspended) then to first order the copy causes no problem. But it is not difficult to cook up a scenario where another thread or call-back into Java sees a different state from C.

Alternatively, one uses the "critical" methods, and suffers restrictions that are, I expect, unenforcible on arbitrary CPython extension modules, such as being short and not yielding the CPU.

I'm reminded of the relationship in CPython between C-code and interpreted code, where the GIL must be held, proving all other threads are "restubg" between instructions, and a context switch is only allowed when surrounded by the appropriate magical incantations. I think the Universe is trying to tell us something.

The problem I see with the DIRECT_NIO flag is that one cannot expect to choose, at the point of getting a PyBuffer, whether that buffer should be direct or heap. The data that hold the state of an object have a certain implementation in Java, and so the buffer will be a heap buffer. Or one can imagine a PyObject whose state is always in a direct ByteBuffer (representing an image mapped from disk, say) and then the PyBuffer would always be direct. Just possibly objects whose main purpose is to be native-friendly would have that implementation. Just possibly, this is a thing you get to choose when the object is constructed.

The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements , except that I suppose the object knows it has done it and can handle intervening access via the Java API ... effectively a change of implementation on the fly.

Jeff Allen
On 15/05/2016 16:39, Stefan Richthofer wrote:
Just an add-on to my recent post:
 
>It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.
 
I think it is likely the JVM does not offer a backing array, if the buffer is created as direct (i.e. these flags likely exclude each other), because this would imply array pinning and all the restrictions coming with it. I didn't test it though, but anyway we cannot rely on the one or other behavior, as doc explicitly does not guarantee a backing array for direct buffers, saying this is "implementation specific".
 
 
Gesendet: Sonntag, 15. Mai 2016 um 11:57 Uhr
Von: "Jeff Allen" <[hidden email]>
An: "Stefan Richthofer" <[hidden email]>, "Jython Developers" <[hidden email]>
Betreff: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen
------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________ Jython-dev mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/jython-dev

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Buffer protocol - direct vs. JVM ByteBuffers

Jeff Allen-2
On 17/05/2016 01:52, Stefan Richthofer wrote:

The thread could create two (or more) PyBuffer-views of the same object and hand both to various functions that read and write on them without calling release (and thus trigger copy-back) inbetween. The extension would expect if view 'A' was modified, view 'B' already reflects this modification when passed to another function.
The object can hand out a second reference to the same PyBuffer. (It's not required to, but the built-ins do.)

> The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. [...] effectively a change of implementation on the fly.
 
This is what I have in mind. Changing the backend on the fly shouldn't be much more expensive than creating a copy, but would then save the cost of copy-back and entire copy cost for future calls. Using the bulk-set method of ByteBuffer this should be easy and efficient to do (bulk-get to convert the other direction). The only infeasible situation would be if an AS_DIRECT_NIO buffer was requested while an array-backed buffer is exported and not yet released or vise versa.
Aye, there's the rub, if at least one is writable.

In this case the request should just fail. I suppose for sake of debugging we should add a verbose mode/flag that makes Jython print out (or append it to the error message in bufferErrorFromSyndrome) the exact reason why some buffer-request failed so a user is able to identify the design flaw.
Exceptions should always be that clear. However, it's not really a design flaw: hold a memoryview, and call a numpy function on the array: it's hardly faulty logic. 
> This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements, except that I suppose the object knows it has done it and can handle intervening access via the Java API
 
Since actual views to the same memory would be shared with (native) extensions unlike Get<PrimitiveType>ArrayElements (in copy-case/no array pinning) this would not break the case described above, where multiple PyBuffer views to the same object are in the game.
 
I'm not sure what you mean by "full circle to the behaviour of Get<PrimitiveType>ArrayElements",
I meant in the sense that we have made a copy especially for C and may have to copy it back. You're correct that there are still a number of delicate problems to solve during the period the implementation has changed.

Jeff
Gesendet: Montag, 16. Mai 2016 um 22:04 Uhr
Von: "Jeff Allen" [hidden email]
An: "Stefan Richthofer" [hidden email]
Cc: "Jython Developers" [hidden email]
Betreff: Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Thanks for that. So, the (possible) copy-back semantics of Get<PrimitiveType>ArrayElements effectively make a direct buffer copy of the contents. If the C-code runs in the thread of the Java execution that calls it (or equivalently it is suspended) then to first order the copy causes no problem. But it is not difficult to cook up a scenario where another thread or call-back into Java sees a different state from C.

Alternatively, one uses the "critical" methods, and suffers restrictions that are, I expect, unenforcible on arbitrary CPython extension modules, such as being short and not yielding the CPU.

I'm reminded of the relationship in CPython between C-code and interpreted code, where the GIL must be held, proving all other threads are "restubg" between instructions, and a context switch is only allowed when surrounded by the appropriate magical incantations. I think the Universe is trying to tell us something.

The problem I see with the DIRECT_NIO flag is that one cannot expect to choose, at the point of getting a PyBuffer, whether that buffer should be direct or heap. The data that hold the state of an object have a certain implementation in Java, and so the buffer will be a heap buffer. Or one can imagine a PyObject whose state is always in a direct ByteBuffer (representing an image mapped from disk, say) and then the PyBuffer would always be direct. Just possibly objects whose main purpose is to be native-friendly would have that implementation. Just possibly, this is a thing you get to choose when the object is constructed.

The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements , except that I suppose the object knows it has done it and can handle intervening access via the Java API ... effectively a change of implementation on the fly.

Jeff Allen
On 15/05/2016 16:39, Stefan Richthofer wrote:
Just an add-on to my recent post:
 
>It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.
 
I think it is likely the JVM does not offer a backing array, if the buffer is created as direct (i.e. these flags likely exclude each other), because this would imply array pinning and all the restrictions coming with it. I didn't test it though, but anyway we cannot rely on the one or other behavior, as doc explicitly does not guarantee a backing array for direct buffers, saying this is "implementation specific".
 
 
Gesendet: Sonntag, 15. Mai 2016 um 11:57 Uhr
Von: "Jeff Allen" [hidden email]
An: "Stefan Richthofer" [hidden email], "Jython Developers" [hidden email]
Betreff: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen



------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Buffer protocol - direct vs. JVM ByteBuffers

Stefan Richthofer
> The object can hand out a second reference to the same PyBuffer. (It's not required to, but the built-ins do.)
 
Good point. Now that you mention it I think it should be possible to identify the actual backend even if distinct PyBuffer objects sharing the same backend (array or ByteBuffer) were exposed. Under this view I agree that the single-threaded case is well solvable (and should be solved) this way i.e. with potential copy-back.
 

Thinking of multi-threaded PyBuffer use:
 
I wondered how CPython deals with this (e.g. when  using the threading-module) and tried to find some (official) statement from CPython-world about sharing a buffer between multiple threads, but no luck so far. If anyone has resources or an example about such a use-case, I'd appreciate a pointer.
 
So I suppose the user is fully responsible to synchronize his buffer transactions.
 
Given that a multithreaded BufferProtocol usecase is much more natural in Jython I'd propose we should define a recommended standard process and maybe even API for typical tasks in this setting, e.g. locking a buffer(-section) for write access, or for an atomic-like read/process/write-back transaction.
Having such a standard would yield lock-compatibility between distinct frameworks sharing a buffer-exposing object.
Also, behavior in multithreaded case would become much better predictable/controllable from JyNI perspective. Last but not least it would help to avoid errors (deadlocks etc) in this difficult area; consider that Python users are usually not much experienced with multithread stuff.
 
 
-Stefan
 
 
 
Gesendet: Dienstag, 17. Mai 2016 um 21:08 Uhr
Von: "Jeff Allen" <[hidden email]>
An: "Stefan Richthofer" <[hidden email]>
Cc: "Jython Developers" <[hidden email]>
Betreff: Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers
On 17/05/2016 01:52, Stefan Richthofer wrote:
 
The thread could create two (or more) PyBuffer-views of the same object and hand both to various functions that read and write on them without calling release (and thus trigger copy-back) inbetween. The extension would expect if view 'A' was modified, view 'B' already reflects this modification when passed to another function.
The object can hand out a second reference to the same PyBuffer. (It's not required to, but the built-ins do.)
 
> The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. [...] effectively a change of implementation on the fly.
 
This is what I have in mind. Changing the backend on the fly shouldn't be much more expensive than creating a copy, but would then save the cost of copy-back and entire copy cost for future calls. Using the bulk-set method of ByteBuffer this should be easy and efficient to do (bulk-get to convert the other direction). The only infeasible situation would be if an AS_DIRECT_NIO buffer was requested while an array-backed buffer is exported and not yet released or vise versa.
Aye, there's the rub, if at least one is writable.
 
In this case the request should just fail. I suppose for sake of debugging we should add a verbose mode/flag that makes Jython print out (or append it to the error message in bufferErrorFromSyndrome) the exact reason why some buffer-request failed so a user is able to identify the design flaw.
Exceptions should always be that clear. However, it's not really a design flaw: hold a memoryview, and call a numpy function on the array: it's hardly faulty logic. 
> This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements, except that I suppose the object knows it has done it and can handle intervening access via the Java API
 
Since actual views to the same memory would be shared with (native) extensions unlike Get<PrimitiveType>ArrayElements (in copy-case/no array pinning) this would not break the case described above, where multiple PyBuffer views to the same object are in the game.
 
I'm not sure what you mean by "full circle to the behaviour of Get<PrimitiveType>ArrayElements",
I meant in the sense that we have made a copy especially for C and may have to copy it back. You're correct that there are still a number of delicate problems to solve during the period the implementation has changed.

Jeff
Gesendet: Montag, 16. Mai 2016 um 22:04 Uhr
Von: "Jeff Allen" <ja.py@...>
An: "Stefan Richthofer" <Stefan.Richthofer@...>
Cc: "Jython Developers" <jython-dev@...>
Betreff: Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Thanks for that. So, the (possible) copy-back semantics of Get<PrimitiveType>ArrayElements effectively make a direct buffer copy of the contents. If the C-code runs in the thread of the Java execution that calls it (or equivalently it is suspended) then to first order the copy causes no problem. But it is not difficult to cook up a scenario where another thread or call-back into Java sees a different state from C.

Alternatively, one uses the "critical" methods, and suffers restrictions that are, I expect, unenforcible on arbitrary CPython extension modules, such as being short and not yielding the CPU.

I'm reminded of the relationship in CPython between C-code and interpreted code, where the GIL must be held, proving all other threads are "restubg" between instructions, and a context switch is only allowed when surrounded by the appropriate magical incantations. I think the Universe is trying to tell us something.

The problem I see with the DIRECT_NIO flag is that one cannot expect to choose, at the point of getting a PyBuffer, whether that buffer should be direct or heap. The data that hold the state of an object have a certain implementation in Java, and so the buffer will be a heap buffer. Or one can imagine a PyObject whose state is always in a direct ByteBuffer (representing an image mapped from disk, say) and then the PyBuffer would always be direct. Just possibly objects whose main purpose is to be native-friendly would have that implementation. Just possibly, this is a thing you get to choose when the object is constructed.

The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. This seems to bring us full circle to the behaviour of Get<PrimitiveType>ArrayElements , except that I suppose the object knows it has done it and can handle intervening access via the Java API ... effectively a change of implementation on the fly.

Jeff Allen
On 15/05/2016 16:39, Stefan Richthofer wrote:
Just an add-on to my recent post:
 
>It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.
 
I think it is likely the JVM does not offer a backing array, if the buffer is created as direct (i.e. these flags likely exclude each other), because this would imply array pinning and all the restrictions coming with it. I didn't test it though, but anyway we cannot rely on the one or other behavior, as doc explicitly does not guarantee a backing array for direct buffers, saying this is "implementation specific".
 
 
Gesendet: Sonntag, 15. Mai 2016 um 11:57 Uhr
Von: "Jeff Allen" <ja.py@...>
An: "Stefan Richthofer" <Stefan.Richthofer@...>, "Jython Developers" <jython-dev@...>
Betreff: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

Stefan:

https://github.com/jythontools/jython/pull/39

What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.

Nothing I'm doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of ByteBuffer storage. It may depend on storage.hasArray(), but storage.isDirect() seems to make no difference.

Jeff
-- 
Jeff Allen

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev