Reading all buffered bytes without blocking

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading all buffered bytes without blocking

Paul Moore
Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().

I need this because I want to decode the returned bytes from UTF-8, and I *might* get a character split across the boundary of any arbitrary block size I choose. (I'm happy to ignore the possibility that the *source* did a flush part-way through a character). I don't really want to have to do incremental encoding if I can avoid it - it looks hard...

Thanks,
Paul


Reply | Threaded
Open this post in threaded view
|

Reading all buffered bytes without blocking

Serhiy Storchaka-2
On 03.03.15 18:07, Paul Moore wrote:
> Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().

Just specify large size.



Reply | Threaded
Open this post in threaded view
|

Reading all buffered bytes without blocking

Paul Moore
In reply to this post by Paul Moore
On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka  wrote:
> On 03.03.15 18:07, Paul Moore wrote:
> > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
>
> Just specify large size.

Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).

Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.

For what it's worth, in case anyone wants to know, incremental decoding looks like this:

def get():
    while True:
        data = process.stdout.read(1000)
        if not data:
            break
        yield data
for data in codecs.iterdecode(get(), encoding):
    sys.stdout.write(data)
    sys.stdout.flush()

Thanks.
Paul


Reply | Threaded
Open this post in threaded view
|

Reading all buffered bytes without blocking

Oscar
In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9 at googlegroups.com>,
 <wxjmfauth at gmail.com> wrote:
>>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>>> buffer.decode('utf-8')
>Traceback (most recent call last):
>  File "<eta last command>", line 1, in <module>
>UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999:
>unexpected end of data
>>>>
>>>> # BOUM

hmm...

>>> import sys as jmr
>>> input = jmr.stdin.fileno()
>>> output = jmr.stdout.fileno()
>>> value = output / input
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> # BOUM

--
[J|O|R] <- .signature.gz