Non-blocking IO update.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Non-blocking IO update.

Alan Kennedy-2
Hi All,

I'm sure some of you are aware that I have committed to updating the
jython socket, select and asyncore modules to support non-blocking I/O.

I have the code fully written, and the test suites passing, on the first
run at least.

The problem arises on the second and subsequent runs of the test suite:
for some reason, many of the sockets created leave open sockets in
either the TIME_WAIT or CLOSE_WAIT state. These cause subsequent bind
calls to the test suite server port (50007) to hang, causing the entire
test suite to hang, and thus fail.

All of my reading on the subject has failed to elucidate a solution to
the problem. Threads which offer further information are here

Trying to avoid TIME_WAIT buildup when using SocketChannels
http://forum.java.sun.com/thread.jspa?threadID=556212&messageID=2726932

Taming the NIO circus
http://forum.java.sun.com/thread.jspa?threadID=459338&start=105

NIO and CLOSE_WAIT on connections
http://forum.java.sun.com/thread.jspa?forumID=4&hilite=false&start=30&threadID=478802&range=15&q=

I have tried every possible sequence of opening and closing connections,
registering and deregistering selection keys before closing sockets and
selectors, etc, etc, etc, etc. But nothing solves the problem.

I am beginning to wonder if this is a problem specific to the Windows
platform: I have developed and run the code on Windows 2000 Server and
Windows 2003 Server. My next plan is to run up either a Linux or
OpenSolaris installation and try it on those.

Unfortunately, I have little time available to go into the excruciating
detail required to diagnose and solve these problems: e.g. running
packet captures on the client and server ends and tracing every fine
detail of RST, SYN, ACK flags, etc, and comparing them with the state
machines of the socket RFCs. I have a suspicion that the whole issue is
timing related, since I do see slightly different behaviour when I put
in lots of time.sleeps between the various unit tests, i.e. the failures
become intermittent instead of "guaranteed to fail on the first run".

If anyone has come across such problems before, or knows of the cause of
or solution to such problems, I'd be delighted to hear about it.

So, in summary, my nonblocking I/O stuff is far from ready for the
prime-time.

Regards,

Alan.



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Richie Hindle

[Alan]
> The problem arises on the second and subsequent runs of the test suite:
> for some reason, many of the sockets created leave open sockets in
> either the TIME_WAIT or CLOSE_WAIT state.

I imagine I'm teaching my grandmother to suck eggs here, but I'll mention
this anyway: the usual fix for this sort of thing is to set SO_REUSEADDR
on any socket with which you use bind().

--
Richie Hindle
[hidden email]




-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Alan Kennedy-2
[Alan]
>>The problem arises on the second and subsequent runs of the test suite:
>>for some reason, many of the sockets created leave open sockets in
>>either the TIME_WAIT or CLOSE_WAIT state.

[Richie Hindle]
> I imagine I'm teaching my grandmother to suck eggs here, but I'll mention
> this anyway: the usual fix for this sort of thing is to set SO_REUSEADDR
> on any socket with which you use bind().

Yep, sucked those eggs dry already ;-)

I tried all possible combinations of client and server socket options,
i.e. SO_REUSEADDR, SO_LINGER, TCP_NODELAY, etc.

But thanks for trying anyway :-)

Cheers,

Alan.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Scott Lamb
In reply to this post by Alan Kennedy-2
On 9 Oct 2005, at 05:46, Alan Kennedy wrote:

> The problem arises on the second and subsequent runs of the test  
> suite: for some reason, many of the sockets created leave open  
> sockets in either the TIME_WAIT or CLOSE_WAIT state. These cause  
> subsequent bind calls to the test suite server port (50007) to  
> hang, causing the entire test suite to hang, and thus fail.

You can pass 0 as the port number when creating server sockets. The  
kernel will dynamically assign a port. Then use getsockname(2)  
(Socket.getLocalSocketAddress()) to determine the port to connect to.  
Then you don't have to assume a specific port is available for  
testing. As a bonus, these TIME_WAIT sorts of problems become  
irrelevant, since it will use a different address each time.

Otherwise...hmm, I think SO_REUSEPORT is necessary on some platforms.  
(I'm not sure what purpose it serves except "something you have to  
set in addition to SE_REUSEADDR when it's defined on the platform".)  
I don't know if there's a way to set it in Java.

I'm working on a socket test suite to figure out these sorts of  
problems. <http://www.slamb.org/svn/repos/trunk/projects/socket_tests/ 
 > Patches welcome.

--
Scott Lamb <http://www.slamb.org/>



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Alan Kennedy-2
[Alan Kennedy]
>> The problem arises on the second and subsequent runs of the test  
>> suite: for some reason, many of the sockets created leave open  
>> sockets in either the TIME_WAIT or CLOSE_WAIT state. These cause  
>> subsequent bind calls to the test suite server port (50007) to  hang,
>> causing the entire test suite to hang, and thus fail.

[Scott Lamb]
> You can pass 0 as the port number when creating server sockets. The  
> kernel will dynamically assign a port. Then use getsockname(2)  
> (Socket.getLocalSocketAddress()) to determine the port to connect to.  
> Then you don't have to assume a specific port is available for  testing.
> As a bonus, these TIME_WAIT sorts of problems become  irrelevant, since
> it will use a different address each time.

Thanks Scott,

That sounds like a really promising approach to solving my current test
problems. It's the next thing I'll try.

> Otherwise...hmm, I think SO_REUSEPORT is necessary on some platforms.  
> (I'm not sure what purpose it serves except "something you have to  set
> in addition to SE_REUSEADDR when it's defined on the platform".)  I
> don't know if there's a way to set it in Java.

Hmmm, my googling on "REUSEPORT java" shows that you may be onto
something there. I think I'll have to get a look at the source for a JVM
that implements java.nio to get more details. AFAICT, there is no way to
set SO_REUSEPORT from java code.

> I'm working on a socket test suite to figure out these sorts of  
> problems. <http://www.slamb.org/svn/repos/trunk/projects/socket_tests/ >
> Patches welcome.

I am unhappy with the structure of the cpython test_socket code, the 2.4
version of which I have tried to make my own code pass. I think a fresh
test suite, redesigned from the ground up, would be a good thing to have.

However, I see that your cpython test code makes use of python 2.4
decorators, which are not yet available in jython, so I'd have to
refactor your code to use pre-2.4 decorators (unless someone knows of
any syntactical hacks that make 2.4 decorators work under pre-2.4 syntax?)

Thanks again Scott.

Cheers,

Alan.



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Paul D. Fernhout
In reply to this post by Alan Kennedy-2
I've been writing a old-fashioned? single-threaded message passing server
with long term client connections in Jython using the Java sockets and
aSocket.setSoTimeout(1) and BufferedReader.ready() [not sure how that
ready() call works but it seems to avoid a timeout?].

It would have been nice to have your stuff in Jython at the start though
instead (and maybe to use twisted etc.). I'll probably be moving to a
select type model with java nio if this server sees more development down
the road (and take a harder look at your stuff then).

I'll throw my two cents in to try to be helpful with more of the obvious
-- though I may be missing the point, not being an expert on sockets and
not fully understanding all aspects of the problem you mention.

The obvious question to me is: when you run your test suite once, and then
close the application to shut the JVM, can you then run it again within a
new JVM with no errors? Or is this, as the reply to the poster in the
first link you reference suggested, something to do with an expectation of
how sockets work at a low level (in which case, closing the JVM might have
no effect)? That reply was: "The TCP/IP spec indicates that a socket will
remain in TIME_WAIT for twice the maximum segment lifetime (MSL) before
being finally closed. This is regardless of whether all the ACKs have been
received, and is designed to protect against reuse of sequence numbers
when there may still be segements in the network that contain them. The
MSL is 2 minutes, so the total time in TIME_WAIT would be four minutes."

Being sure the JVM is shut down may not be as trivial as it seems. When I
was developing the initial code on my machine, using Eclipse 3.0 with JVM
1.4 under Linux and launching the code from within Eclipse, I found I
actually had to shut down (and restart) Eclipse to get the socket for
accept closed when I was first making lots of mistakes with closing both
sockets and windows. I guess Eclipse and the applications it ran shared a
JVM somehow? Or more likely, closing Eclipse ensured and JVM instances it
started got completely shut down? Restarting was time consuming and
annoying, so for a while I just kept bumping up the port number in my code
by one by hand with each new test. :-)

Anyway, if your test suite can run again after being sure the JVM it ran
in is really truly shut down, then it seems to me the sockets aren't
getting properly closed in your test suite after the first run, because
obviously the JVM can close them properly when it exits. Maybe there is a
missing .close() in your test code and the socket is then not garbage
collected right away (because Jython can't guarantee when that happens)?

This may be paranoid and overkill and obvious, but here is the code I use
for closing the java sockets:

        #SERVER SIDE:
        if not self.listenSocket.isClosed():
                 self.listenSocket.close()
         for clientConnection in self.clientConnections:
             if not clientConnection.clientSocket.isClosed():
                 clientConnection.clientSocket.shutdownInput()
                 clientConnection.clientSocket.shutdownOutput()
                 clientConnection.clientSocket.close()

        #CLIENT SIDE:
         if self.clientSocket:
             if not self.clientSocket.isClosed():
                 self.clientSocket.shutdownInput()
                 self.clientSocket.shutdownOutput()
                 self.clientSocket.close()

A close probably is good enough, but it still seemed that shutting down
the input and output first before closing was the cautious thing to do.
Perhaps you could try that in your tests?

Anyway, probably too simple a solution. Still, perhaps you could also try
tunning the GC manually with a System.gc() after your test suite to see
what happens if it is indeed a missing close()? If the problem goes away
with a gc forcing a finalization on any open sockets, maybe something
isn't getting closed.

Or, maybe, as you imply, this is just a bug in the JVM under Windows and
it is just not responding completely to a close? I'm running Linux (Debian
unstable 2.6.10 kernel i686), and I am using the latest from the Jython
CVS and set up with JRE 1.4 and JRE 1.5, so I could help test maybe for
Linux if it was easy to try your code (though this week is fairly hectic
with a big deliverable on Monday, so I can't promise a speedy turnaround).

--Paul Fernhout

Alan Kennedy wrote:

> [Alan]
>
>>> The problem arises on the second and subsequent runs of the test
>>> suite: for some reason, many of the sockets created leave open
>>> sockets in either the TIME_WAIT or CLOSE_WAIT state.
>
>
> [Richie Hindle]
>
>> I imagine I'm teaching my grandmother to suck eggs here, but I'll mention
>> this anyway: the usual fix for this sort of thing is to set SO_REUSEADDR
>> on any socket with which you use bind().
>
>
> Yep, sucked those eggs dry already ;-)
>
> I tried all possible combinations of client and server socket options,
> i.e. SO_REUSEADDR, SO_LINGER, TCP_NODELAY, etc.
>
> But thanks for trying anyway :-)
>
> Cheers,
>
> Alan.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Scott Lamb
In reply to this post by Alan Kennedy-2
On Oct 12, 2005, at 3:14 AM, Alan Kennedy wrote:

> I am unhappy with the structure of the cpython test_socket code,  
> the 2.4 version of which I have tried to make my own code pass. I  
> think a fresh test suite, redesigned from the ground up, would be a  
> good thing to have.
>
> However, I see that your cpython test code makes use of python 2.4  
> decorators, which are not yet available in jython, so I'd have to  
> refactor your code to use pre-2.4 decorators (unless someone knows  
> of any syntactical hacks that make 2.4 decorators work under  
> pre-2.4 syntax?)

Hmm, interesting. I hadn't been thinking of my test suite as a  
potential replacement for CPython's test_socket. (I haven't even  
looked at that code.) I started writing it after being surprised by  
weird behavior in some C++ code. It has been a tool for me to poke at  
TCP/IP and the kernel's socket implementation. I chose Python because  
it's more fun than C or C++, yet its socket calls seem to map  
directly to the underlying system calls.

The decorators you mention toggle tests that dynamically manipulate  
firewall rules. They require root.

--
Scott Lamb <http://www.slamb.org/>




-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Kent Johnson
In reply to this post by Alan Kennedy-2
Alan Kennedy wrote:
> However, I see that your cpython test code makes use of python 2.4
> decorators, which are not yet available in jython, so I'd have to
> refactor your code to use pre-2.4 decorators (unless someone knows of
> any syntactical hacks that make 2.4 decorators work under pre-2.4 syntax?)

Maybe you know this, but

@someDecorator
def myFunc():
  ...

can be replaced with

def myFunc():
  ...
myFunc = someDecorator(myFunc)

The decorator syntax is just sugar for the second form.

Kent



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Alan Kennedy-2
In reply to this post by Paul D. Fernhout
[Paul D. Fernhout]
> I'll throw my two cents in to try to be helpful with more of the obvious
> -- though I may be missing the point, not being an expert on sockets and
> not fully understanding all aspects of the problem you mention.

Thanks for your input Paul.

> The obvious question to me is: when you run your test suite once, and
> then close the application to shut the JVM, can you then run it again
> within a new JVM with no errors?

The JVM is *definitely* shutdown between runs: I'm running from the
command line, not an IDE.

> Or is this, as the reply to the poster
> in the first link you reference suggested, something to do with an
> expectation of how sockets work at a low level (in which case, closing
> the JVM might have no effect)? That reply was: "The TCP/IP spec
> indicates that a socket will remain in TIME_WAIT for twice the maximum
> segment lifetime (MSL) before being finally closed. This is regardless
> of whether all the ACKs have been received, and is designed to protect
> against reuse of sequence numbers when there may still be segements in
> the network that contain them. The MSL is 2 minutes, so the total time
> in TIME_WAIT would be four minutes."

I'm pretty certain that it is some OS-level timeout issue, such as the
2*MSL timeout, because if I wait until netstat reports that all the
TIME_WAIT and CLOSE_WAIT state sockets have disappeared, which happens
after several minutes, the test suite runs cleanly again.

> Anyway, if your test suite can run again after being sure the JVM it ran
> in is really truly shut down, then it seems to me the sockets aren't
> getting properly closed in your test suite after the first run, because
> obviously the JVM can close them properly when it exits. Maybe there is
> a missing .close() in your test code and the socket is then not garbage
> collected right away (because Jython can't guarantee when that happens)?

I'm virtually certain that I have closed all the sockets (beginning to
doubt my sanity at this stage %-), and that I have tried all variations
on socket shutdown, i.e. shutdownInput, shutdownOutput, etc. I have also
ensured to deregister all selection keys for the socket, close any
selectors (after one last select, to clear the selection key table, as
recommended in several messages on the Sun forums). Nothing seems to work.

> This may be paranoid and overkill and obvious, but here is the code I
> use for closing the java sockets:

[code snipped]

> A close probably is good enough, but it still seemed that shutting down
> the input and output first before closing was the cautious thing to do.
> Perhaps you could try that in your tests?

I'm pretty sure I already have. But I'll do it again, just in case I
missed out a possible combination in various code approaches I have tried.

> Anyway, probably too simple a solution. Still, perhaps you could also
> try tunning the GC manually with a System.gc() after your test suite to
> see what happens if it is indeed a missing close()? If the problem goes
> away with a gc forcing a finalization on any open sockets, maybe
> something isn't getting closed.

I'm thinking that I need to maintain my own internal state variable on
my socket objects to track exactly what state they are in.

> Or, maybe, as you imply, this is just a bug in the JVM under Windows and
> it is just not responding completely to a close? I'm running Linux
> (Debian unstable 2.6.10 kernel i686), and I am using the latest from the
> Jython CVS and set up with JRE 1.4 and JRE 1.5, so I could help test
> maybe for Linux if it was easy to try your code (though this week is
> fairly hectic with a big deliverable on Monday, so I can't promise a
> speedy turnaround).

I'm going to clean the code up and make it available for others, to see
if they can reproduce on different platforms.

But that won't happen until the weekend, since I am doing all of this on
non-work time.

Thanks for the input.

Cheers,

Alan.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Non-blocking IO update.

Alan Kennedy-2
In reply to this post by Kent Johnson
[Alan Kennedy]
>> However, I see that your cpython test code makes use of python 2.4
>> decorators, which are not yet available in jython, so I'd have to
>> refactor your code to use pre-2.4 decorators (unless someone knows of
>> any syntactical hacks that make 2.4 decorators work under pre-2.4
>> syntax?)

[Kent Johnson]

> Maybe you know this, but
>
> @someDecorator
> def myFunc():
>  ...
>
> can be replaced with
> def myFunc():
>  ...
> myFunc = someDecorator(myFunc)
>
> The decorator syntax is just sugar for the second form.

Thanks Kent,

I was aware of that. I was hoping there might be some hack that might
make it possible for me to run Scott's code without modification, i.e.
without changing the text of his modules.

But my primary approach for now is going to be using a different port
number for each test. I'll worry about other test suites later.

Regards,

Alan.



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jython-dev mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jython-dev