Quantcast

Too many open files(session in files dead lock?)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Too many open files(session in files dead lock?)

bogun.dmitriy
Hello.

Linux 2.6.36-hardened-r6 #2 SMP Mon Jan 31 00:18:27 EET 2011 i686 AMD Phenom(tm) 9650 Quad-Core Processor
Python 2.7.1
cherrypy-3.2.0

Today my cherrypy app have failed 3 times, with error "Too many open files". It is limited to 1024 file descriptor. In usual situation it use ~10 file descriptors. When application was in "fail" state I have looked in /proc/$pid/fd to see what descriptors is used. And all of them was client sockets. So I am shure that it is was not fd leak in my app(it didn't use sockets).

According to nginx logs(it placed in front of app) - when app fail, it was unable to connect to and got error "connect() failed (111: Connection refused) while connecting to upstream". But app process does not died and ignore SIGTERM. From app logs I know that some request handles start to process but not ends. I know that they freeze after dispatcher choose handler, but before handler starts. I use _cp_dispatch method for routing and log route process, so I can see in what phase of request processing it freeze.

I can guess only one place where it cant freeze. App use cherrypy.session module, and configured to store them in files. I think that there is some kind of dead lock on session lock phase. But I can't reproduce problem on test server so it is only hypothesis. And not want it repeats on production server.

Now I switch to session in memory. If my hypothesis is correct - app will not fail any more.

I need advice of method how to reproduce and track this problem.

And one more thing - is it should stop working process in this situation? Because process not died, manager process did not restart app, and entire site become unavailable.

PS Sorry my ugly english.

Small part of app log:
2011-07-11 21:03:19,682 13091:-1599640720 [   DEBUG:root] route('res') => <route_node: [<rr: u'logo.png'>]>
2011-07-11 21:03:19,682 13091:-1599640720 [   DEBUG:root] route('logo.png') => <class 'puzzleit.http.migration.proxy_auth'>
2011-07-11 21:03:19,684 13091:-1599640720 [   DEBUG:root] Run <puzzleit.http.migration.proxy_auth object at 0x1ca235ac>, remote: httputil.Host('109.234.11.212', 50304, '')
2011-07-11 21:03:19,685 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c57014c> checked out from pool
2011-07-11 21:03:19,685 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] BEGIN (implicit)
2011-07-11 21:03:19,686 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] SELECT session_share.user_idnr AS session_share_user_idnr
2011-07-11 21:03:19,686 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] ('f38075e60c7c019228f60ed95bdbc214',)
2011-07-11 21:03:19,687 13091:-1599640720 [   DEBUG:sqlalchemy.engine.base.Engine.0x...9aac] Col ('session_share_user_idnr',)
2011-07-11 21:03:19,688 13091:-1599640720 [   DEBUG:root] There is no session_share record for PHPSESSID="f38075e60c7c019228f60ed95bdbc214"
2011-07-11 21:03:19,688 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] COMMIT
2011-07-11 21:03:19,691 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c57014c> being returned to pool
2011-07-11 21:03:19,691 13091:-1599640720 [   DEBUG:root] Perform redirect: HTTPRedirect([], 304)
2011-07-11 21:03:19,692 13091:-1599640720 [    INFO:cherrypy.access.465627660] 109.234.11.212 - - [11/Jul/2011:21:03:19] "GET /res/logo.png HTTP/1.0" 304 - "http://puzzleit.org/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30"
2011-07-11 21:03:19,694 13091:-1599640720 [   DEBUG:root] route('res') => <route_node: [<rr: u'logo.png'>]>
2011-07-11 21:03:19,694 13091:-1599640720 [   DEBUG:root] route('logo.png') => <class 'puzzleit.http.migration.proxy_auth'>
2011-07-11 21:03:19,695 13091:-1599640720 [   DEBUG:root] Run <puzzleit.http.migration.proxy_auth object at 0x1c8c5d8c>, remote: httputil.Host('95.58.122.0', 50306, '')
2011-07-11 21:03:19,696 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c5be8fc> checked out from pool
2011-07-11 21:03:19,696 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] BEGIN (implicit)
2011-07-11 21:03:19,697 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] SELECT session_share.user_idnr AS session_share_user_idnr
2011-07-11 21:03:19,697 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] ('f19d58d61f16bd33b1eb13020db156af',)
2011-07-11 21:03:19,698 13091:-1599640720 [   DEBUG:sqlalchemy.engine.base.Engine.0x...9aac] Col ('session_share_user_idnr',)
2011-07-11 21:03:19,698 13091:-1599640720 [   DEBUG:root] There is no session_share record for PHPSESSID="f19d58d61f16bd33b1eb13020db156af"
2011-07-11 21:03:19,699 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] COMMIT
2011-07-11 21:03:19,701 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c5be8fc> being returned to pool
2011-07-11 21:03:19,702 13091:-1599640720 [   DEBUG:root] Perform redirect: HTTPRedirect([], 304)
2011-07-11 21:03:19,702 13091:-1599640720 [    INFO:cherrypy.access.465627660] 95.58.122.0 - - [11/Jul/2011:21:03:19] "GET /res/logo.png HTTP/1.0" 304 - "http://puzzleit.org/54a6db690762fbc23fe3a648b7452d5e/rating/" "Opera/9.80 (Windows NT 6.1; U; MRA 5.8 (build 4157); ru) Presto/2.8.131 Version/11.11"
2011-07-11 21:03:20,486 13091:-1599640720 [   DEBUG:root] route('res') => <route_node: [<rr: u'logo.png'>]>
2011-07-11 21:03:20,486 13091:-1599640720 [   DEBUG:root] route('logo.png') => <class 'puzzleit.http.migration.proxy_auth'>
2011-07-11 21:26:23,238 13091:-1591248016 [   ERROR:cherrypy.error] [11/Jul/2011:21:26:23] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in _start_http_thread
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/__init__.py", line 1765, in start
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/__init__.py", line 1800, in tick
  File "/usr/lib/python2.7/socket.py", line 202, in accept
error: [Errno 24] Too many open files

2011-07-11 21:26:23,238 13091:-1591248016 [    INFO:cherrypy.error] [11/Jul/2011:21:26:23] ENGINE Bus STOPPING
... but even after this - process 13091 continue to live, and even ignore SIGTERM. There was not log cut between
21:03:20 and 21:26:23, it just now serve requests.

--
You received this message because you are subscribed to the Google Groups "cherrypy-devel" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cherrypy-devel/-/N1fUk2yN5bgJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/cherrypy-devel?hl=en.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Too many open files(session in files dead lock?)

Shaun Roe-2
I had the same problem, with sql; I found that the open files from sql were never closed by closing the database. I ended up writing a routine which looked at open file descriptors and clsing them with system calls


HTH
shaun
Shaun Roe
AIM, iChat: shuanroe
Skype: shaunroe




On 11 Jul 2011, at 22:51, Дмитрий Богун wrote:

Hello.

Linux 2.6.36-hardened-r6 #2 SMP Mon Jan 31 00:18:27 EET 2011 i686 AMD Phenom(tm) 9650 Quad-Core Processor
Python 2.7.1
cherrypy-3.2.0

Today my cherrypy app have failed 3 times, with error "Too many open files". It is limited to 1024 file descriptor. In usual situation it use ~10 file descriptors. When application was in "fail" state I have looked in /proc/$pid/fd to see what descriptors is used. And all of them was client sockets. So I am shure that it is was not fd leak in my app(it didn't use sockets).

According to nginx logs(it placed in front of app) - when app fail, it was unable to connect to and got error "connect() failed (111: Connection refused) while connecting to upstream". But app process does not died and ignore SIGTERM. From app logs I know that some request handles start to process but not ends. I know that they freeze after dispatcher choose handler, but before handler starts. I use _cp_dispatch method for routing and log route process, so I can see in what phase of request processing it freeze.

I can guess only one place where it cant freeze. App use cherrypy.session module, and configured to store them in files. I think that there is some kind of dead lock on session lock phase. But I can't reproduce problem on test server so it is only hypothesis. And not want it repeats on production server.

Now I switch to session in memory. If my hypothesis is correct - app will not fail any more.

I need advice of method how to reproduce and track this problem.

And one more thing - is it should stop working process in this situation? Because process not died, manager process did not restart app, and entire site become unavailable.

PS Sorry my ugly english.

Small part of app log:
2011-07-11 21:03:19,682 13091:-1599640720 [   DEBUG:root] route('res') => <route_node: [<rr: u'logo.png'>]>
2011-07-11 21:03:19,682 13091:-1599640720 [   DEBUG:root] route('logo.png') => <class 'puzzleit.http.migration.proxy_auth'>
2011-07-11 21:03:19,684 13091:-1599640720 [   DEBUG:root] Run <puzzleit.http.migration.proxy_auth object at 0x1ca235ac>, remote: httputil.Host('109.234.11.212', 50304, '')
2011-07-11 21:03:19,685 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c57014c> checked out from pool
2011-07-11 21:03:19,685 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] BEGIN (implicit)
2011-07-11 21:03:19,686 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] SELECT session_share.user_idnr AS session_share_user_idnr
2011-07-11 21:03:19,686 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] ('f38075e60c7c019228f60ed95bdbc214',)
2011-07-11 21:03:19,687 13091:-1599640720 [   DEBUG:sqlalchemy.engine.base.Engine.0x...9aac] Col ('session_share_user_idnr',)
2011-07-11 21:03:19,688 13091:-1599640720 [   DEBUG:root] There is no session_share record for PHPSESSID="f38075e60c7c019228f60ed95bdbc214"
2011-07-11 21:03:19,688 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] COMMIT
2011-07-11 21:03:19,691 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c57014c> being returned to pool
2011-07-11 21:03:19,691 13091:-1599640720 [   DEBUG:root] Perform redirect: HTTPRedirect([], 304)
2011-07-11 21:03:19,692 13091:-1599640720 [    INFO:cherrypy.access.465627660] 109.234.11.212 - - [11/Jul/2011:21:03:19] "GET /res/logo.png HTTP/1.0" 304 - "http://puzzleit.org/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30"
2011-07-11 21:03:19,694 13091:-1599640720 [   DEBUG:root] route('res') => <route_node: [<rr: u'logo.png'>]>
2011-07-11 21:03:19,694 13091:-1599640720 [   DEBUG:root] route('logo.png') => <class 'puzzleit.http.migration.proxy_auth'>
2011-07-11 21:03:19,695 13091:-1599640720 [   DEBUG:root] Run <puzzleit.http.migration.proxy_auth object at 0x1c8c5d8c>, remote: httputil.Host('95.58.122.0', 50306, '')
2011-07-11 21:03:19,696 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c5be8fc> checked out from pool
2011-07-11 21:03:19,696 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] BEGIN (implicit)
2011-07-11 21:03:19,697 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] SELECT session_share.user_idnr AS session_share_user_idnr
2011-07-11 21:03:19,697 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] ('f19d58d61f16bd33b1eb13020db156af',)
2011-07-11 21:03:19,698 13091:-1599640720 [   DEBUG:sqlalchemy.engine.base.Engine.0x...9aac] Col ('session_share_user_idnr',)
2011-07-11 21:03:19,698 13091:-1599640720 [   DEBUG:root] There is no session_share record for PHPSESSID="f19d58d61f16bd33b1eb13020db156af"
2011-07-11 21:03:19,699 13091:-1599640720 [    INFO:sqlalchemy.engine.base.Engine.0x...9aac] COMMIT
2011-07-11 21:03:19,701 13091:-1599640720 [   DEBUG:sqlalchemy.pool.QueuePool.0x...98ec] Connection <_mysql.connection open to 'gen0.mysql.stabilis.int' at 1c5be8fc> being returned to pool
2011-07-11 21:03:19,702 13091:-1599640720 [   DEBUG:root] Perform redirect: HTTPRedirect([], 304)
2011-07-11 21:03:19,702 13091:-1599640720 [    INFO:cherrypy.access.465627660] 95.58.122.0 - - [11/Jul/2011:21:03:19] "GET /res/logo.png HTTP/1.0" 304 - "http://puzzleit.org/54a6db690762fbc23fe3a648b7452d5e/rating/" "Opera/9.80 (Windows NT 6.1; U; MRA 5.8 (build 4157); ru) Presto/2.8.131 Version/11.11"
2011-07-11 21:03:20,486 13091:-1599640720 [   DEBUG:root] route('res') => <route_node: [<rr: u'logo.png'>]>
2011-07-11 21:03:20,486 13091:-1599640720 [   DEBUG:root] route('logo.png') => <class 'puzzleit.http.migration.proxy_auth'>
2011-07-11 21:26:23,238 13091:-1591248016 [   ERROR:cherrypy.error] [11/Jul/2011:21:26:23] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in _start_http_thread
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/__init__.py", line 1765, in start
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/__init__.py", line 1800, in tick
  File "/usr/lib/python2.7/socket.py", line 202, in accept
error: [Errno 24] Too many open files

2011-07-11 21:26:23,238 13091:-1591248016 [    INFO:cherrypy.error] [11/Jul/2011:21:26:23] ENGINE Bus STOPPING
... but even after this - process 13091 continue to live, and even ignore SIGTERM. There was not log cut between
21:03:20 and 21:26:23, it just now serve requests.

--
You received this message because you are subscribed to the Google Groups "cherrypy-devel" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cherrypy-devel/-/N1fUk2yN5bgJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/cherrypy-devel?hl=en.

--
You received this message because you are subscribed to the Google Groups "cherrypy-devel" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/cherrypy-devel?hl=en.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Too many open files(session in files dead lock?)

Robert Brewer-4
Дмитрий Богун wrote:

> Today my cherrypy app have failed 3 times, with error
> "Too many open files". It is limited to 1024 file
> descriptor. In usual situation it use ~10 file
> descriptors. When application was in "fail" state I
> have looked in /proc/$pid/fd to see what descriptors
> is used. And all of them was client sockets. So I am
> shure that it is was not fd leak in my app(it didn't
> use sockets).

and Shaun Roe added:
> I had the same problem, with sql; I found that the open files
> from sql were never closed by closing the database. I ended up
> writing a routine which looked at open file descriptors and
> clsing them with system calls

Either of you might be running into http://www.cherrypy.org/ticket/1032, which was recently fixed in trunk and should be released with version 3.2.1 very soon.


Robert Brewer
[hidden email]

--
You received this message because you are subscribed to the Google Groups "cherrypy-devel" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/cherrypy-devel?hl=en.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Too many open files(session in files dead lock?)

bogun.dmitriy
12 июля 2011 г. 2:05 пользователь Robert Brewer <[hidden email]> написал:

> Дмитрий Богун wrote:
>
>> Today my cherrypy app have failed 3 times, with error
>> "Too many open files". It is limited to 1024 file
>> descriptor. In usual situation it use ~10 file
>> descriptors. When application was in "fail" state I
>> have looked in /proc/$pid/fd to see what descriptors
>> is used. And all of them was client sockets. So I am
>> shure that it is was not fd leak in my app(it didn't
>> use sockets).
>
> and Shaun Roe added:
>> I had the same problem, with sql; I found that the open files
>> from sql were never closed by closing the database. I ended up
>> writing a routine which looked at open file descriptors and
>> clsing them with system calls
>
> Either of you might be running into http://www.cherrypy.org/ticket/1032, which was recently fixed in trunk and should be released with version 3.2.1 very soon.
>
It can't be related to my problem. It happend on production server,
autorestart is disabled on that server.

From switchin on immemory session pass ~12hours - problem didn't
repeat. Looks like I was right about deadlock in sessions.

--
You received this message because you are subscribed to the Google Groups "cherrypy-devel" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/cherrypy-devel?hl=en.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Too many open files(session in files dead lock?)

bogun.dmitriy
12 июля 2011 г. 9:50 пользователь  <[hidden email]> написал:

> 12 июля 2011 г. 2:05 пользователь Robert Brewer <[hidden email]> написал:
>> Дмитрий Богун wrote:
>>
>>> Today my cherrypy app have failed 3 times, with error
>>> "Too many open files". It is limited to 1024 file
>>> descriptor. In usual situation it use ~10 file
>>> descriptors. When application was in "fail" state I
>>> have looked in /proc/$pid/fd to see what descriptors
>>> is used. And all of them was client sockets. So I am
>>> shure that it is was not fd leak in my app(it didn't
>>> use sockets).
>>
>> and Shaun Roe added:
>>> I had the same problem, with sql; I found that the open files
>>> from sql were never closed by closing the database. I ended up
>>> writing a routine which looked at open file descriptors and
>>> clsing them with system calls
>>
>> Either of you might be running into http://www.cherrypy.org/ticket/1032, which was recently fixed in trunk and should be released with version 3.2.1 very soon.
>>
> It can't be related to my problem. It happend on production server,
> autorestart is disabled on that server.
>
> From switchin on immemory session pass ~12hours - problem didn't
> repeat. Looks like I was right about deadlock in sessions.
>

Is I am only one, who use session in files on moderately loaded
project(by the way 1r/s is not so much)? Or only I got this problem?

--
You received this message because you are subscribed to the Google Groups "cherrypy-devel" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/cherrypy-devel?hl=en.

Loading...