Python Segfaults with Django 1.1

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Python Segfaults with Django 1.1

Sven Bröckling
Hi,

i have a problem on a debian lenny machine which is not easy to track
down. It started with Apache2 mod_wsgi crashes (segfault in libapr)
after the migration from mod_python. The server crashes sporadic and
everytime with another view called, but everytime after a while (2-7
Days, depending on traffic).

I set up a lighttpd/fastcgi/runfcgi Setup to avoid this, but now
segfaults python, imho for the same reason.

Is there any "right" way to find the reason for the segfaults beyond
letting strace dump 100s of megs of Log for a Week? We habe
dependencies to libcairo (python-cairo), librsvg, geoip and memcache,
so maybe the segfault happens in there.

This is the Kernel log for the apache crashes :

messages.2.gz:Jul 24 22:10:58 vshop kernel: [7816643.903087]
apache2[4686]: segfault at 28 ip 7f6f8f4c2f04 sp 7f6f84043d40 error 4
in libapr-1.so.0.2.12[7f6f8f49f000+32000] messages.2.gz:Jul 24 22:13:50
vshop kernel: [7816817.795137] apache2[4700]: segfault at 28 ip
7f6f8f4c2f04 sp 7f6f7f83ad40 error 4 in
libapr-1.so.0.2.12[7f6f8f49f000+32000] messages.4.gz:Jul  4 21:49:00
vshop kernel: [6079995.222199] apache2[7564]: segfault at 28 ip
7f98f256e655 sp 7f98e310bd40 error 4 in
libapr-1.so.0.2.12[7f98f254b000+32000] messages.4.gz:Jul  6 04:58:35
vshop kernel: [6192612.659789] apache2[10605]: segfault at 28 ip
7fae92380655 sp 7fae85f23d40 error 4 in
libapr-1.so.0.2.12[7fae9235d000+32000] messages.4.gz:Jul  6 04:58:35
vshop kernel: [6192612.701478] apache2[10565]: segfault at 28 ip
7fae92380655 sp 7fae86724d40 error 4 in
libapr-1.so.0.2.12[7fae9235d000+32000] messages.4.gz:Jul  7 04:31:08
vshop kernel: [6277733.106236] apache2[4798]: segfault at 28 ip
7f2c8d6dff04 sp 7f2c84a89d40 error 4 in
libapr-1.so.0.2.12[7f2c8d6bc000+32000] messages.4.gz:Jul  7 04:32:41
vshop kernel: [6277825.802407] apache2[4855]: segfault at 28 ip
7f2c8d6dff04 sp 7f2c84288d40 error 4 in
libapr-1.so.0.2.12[7f2c8d6bc000+32000]

Thanks in advance for any hint :)
  Sven

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: Python Segfaults with Django 1.1

Reinout van Rees
On 08/05/2010 12:50 PM, Sven Broeckling wrote:
>
> i have a problem on a debian lenny machine which is not easy to track
> down. It started with Apache2 mod_wsgi crashes (segfault in libapr)
> after the migration from mod_python. The server crashes sporadic and
> everytime with another view called, but everytime after a while (2-7
> Days, depending on traffic).

Probably not related, but you never know... I've seen this happen in ye
olde days with a zope server that you'd start in daemon mode from the
terminal.  After 2-4 days the terminal would die and suddenly the
perfectly-running zope server would have nowhere to print its console
output once an infrequent error occured. And it would die.

The wsgi stuff works differently, so this shouldn't be the problem.
Mentioning it anyway, perhaps it rings some distant bell :-)


Reinout


--
Reinout van Rees - [hidden email] - http://reinout.vanrees.org
Programmer at http://www.nelen-schuurmans.nl
"Military engineers build missiles. Civil engineers build targets"

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: Python Segfaults with Django 1.1

Sven Bröckling
> > i have a problem on a debian lenny machine which is not easy to
> > track down. It started with Apache2 mod_wsgi crashes (segfault in
> > libapr) after the migration from mod_python. The server crashes
> > sporadic and everytime with another view called, but everytime
> > after a while (2-7 Days, depending on traffic).
> Probably not related, but you never know... I've seen this happen in
> ye olde days with a zope server that you'd start in daemon mode from
> the terminal.  After 2-4 days the terminal would die and suddenly the
> perfectly-running zope server would have nowhere to print its console
> output once an infrequent error occured. And it would die.
> The wsgi stuff works differently, so this shouldn't be the problem.
> Mentioning it anyway, perhaps it rings some distant bell :-)

I got another clue, it seems that the python process runs out of file
handles. After 10k requests (via ab -c 1 -n 10000) i got several "not
found" io exceptions like "/dev/urandom not found", TemplateError:
Template xy not found and this one : Error Opening
file /path/to/geoip/GeoIP.dat

Maybe that is why the mod_python setup runs fine, due to the restart
for each request.

Sven

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: Python Segfaults with Django 1.1

Reinout van Rees
On 08/05/2010 02:06 PM, Sven Broeckling wrote:
>
> I got another clue, it seems that the python process runs out of file
> handles. After 10k requests (via ab -c 1 -n 10000) i got several "not
> found" io exceptions like "/dev/urandom not found",

Some tempfile that isn't getting closed? Watch your /tmp size, for
instance. Normally if you do it from views or so, the tempfile object
gets garbage collected and closed.

Anyway, you probably already know where to look :-)


Reinout



--
Reinout van Rees - [hidden email] - http://reinout.vanrees.org
Programmer at http://www.nelen-schuurmans.nl
"Military engineers build missiles. Civil engineers build targets"

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: Python Segfaults with Django 1.1

Graham Dumpleton-2
In reply to this post by Sven Bröckling


On Aug 5, 10:06 pm, Sven Broeckling <[hidden email]> wrote:

> > > i have a problem on a debian lenny machine which is not easy to
> > > track down. It started with Apache2 mod_wsgi crashes (segfault in
> > > libapr) after the migration from mod_python. The server crashes
> > > sporadic and everytime with another view called, but everytime
> > > after a while (2-7 Days, depending on traffic).
> > Probably not related, but you never know... I've seen this happen in
> > ye olde days with a zope server that you'd start in daemon mode from
> > the terminal.  After 2-4 days the terminal would die and suddenly the
> > perfectly-running zope server would have nowhere to print its console
> > output once an infrequent error occured. And it would die.
> > The wsgi stuff works differently, so this shouldn't be the problem.
> > Mentioning it anyway, perhaps it rings some distant bell :-)
>
> I got another clue, it seems that the python process runs out of file
> handles. After 10k requests (via ab -c 1 -n 10000) i got several "not
> found" io exceptions like "/dev/urandom not found", TemplateError:
> Template xy not found and this one : Error Opening
> file /path/to/geoip/GeoIP.dat
>
> Maybe that is why the mod_python setup runs fine, due to the restart
> for each request.

But mod_python doesn't restart on each request.

Use lsof or ofiles to work out what open file handles still exist for
a process and thus what isn't being closed.

Graham


> Sven

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: Python Segfaults with Django 1.1

Graham Dumpleton-2
In reply to this post by Sven Bröckling


On Aug 5, 10:06 pm, Sven Broeckling <[hidden email]> wrote:

> > > i have a problem on a debian lenny machine which is not easy to
> > > track down. It started with Apache2 mod_wsgi crashes (segfault in
> > > libapr) after the migration from mod_python. The server crashes
> > > sporadic and everytime with another view called, but everytime
> > > after a while (2-7 Days, depending on traffic).
> > Probably not related, but you never know... I've seen this happen in
> > ye olde days with a zope server that you'd start in daemon mode from
> > the terminal.  After 2-4 days the terminal would die and suddenly the
> > perfectly-running zope server would have nowhere to print its console
> > output once an infrequent error occured. And it would die.
> > The wsgi stuff works differently, so this shouldn't be the problem.
> > Mentioning it anyway, perhaps it rings some distant bell :-)
>
> I got another clue, it seems that the python process runs out of file
> handles. After 10k requests (via ab -c 1 -n 10000) i got several "not
> found" io exceptions like "/dev/urandom not found", TemplateError:
> Template xy not found and this one : Error Opening
> file /path/to/geoip/GeoIP.dat
>
> Maybe that is why the mod_python setup runs fine, due to the restart
> for each request.


Also make sure you aren't still loading mod_python into Apache if
using mod_wsgi as the presence of mod_python can in some cases cause
mod_wsgi to misbehave.

Graham

> Sven

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

[solved] Re: Python Segfaults with Django 1.1

Sven Bröckling
In reply to this post by Reinout van Rees
> > I got another clue, it seems that the python process runs out of
> > file handles. After 10k requests (via ab -c 1 -n 10000) i got
> > several "not found" io exceptions like "/dev/urandom not found",
> Some tempfile that isn't getting closed? Watch your /tmp size, for
> instance. Normally if you do it from views or so, the tempfile object
> gets garbage collected and closed.
> Anyway, you probably already know where to look :-)
That was exactly the right point i guess. I was caught by this one :
http://bugs.python.org/issue6875
After adding os.close(fd) to an svg -> png Function the App doesn't
leak file descriptors any more.

In about 30k Requests i know if it was the cause for the segfaults :)

Thanks Reinout :)
  Sven

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: Python Segfaults with Django 1.1

Sven Bröckling
In reply to this post by Graham Dumpleton-2
> > I got another clue, it seems that the python process runs out of
> > file handles. After 10k requests (via ab -c 1 -n 10000) i got
> > several "not found" io exceptions like "/dev/urandom not found",
> > TemplateError: Template xy not found and this one : Error Opening
> > file /path/to/geoip/GeoIP.dat
> > Maybe that is why the mod_python setup runs fine, due to the restart
> > for each request.
> Also make sure you aren't still loading mod_python into Apache if
> using mod_wsgi as the presence of mod_python can in some cases cause
> mod_wsgi to misbehave.
No, mod_python is deactivated, but the file descriptor leak was imho
the problem..

The Fix, for the record :)

--- a/apps/catalog/templatetags/composedimage.py
+++ b/apps/catalog/templatetags/composedimage.py
@@ -105,7 +105,7 @@ class ComposedImage(object):
     def _open_svg_as_image(self, fn, width, height, position, factor,
offset): import cairo
         import rsvg
-        file = tempfile.mkstemp(suffix='.png', prefix='tmc_svg_')[1]
+        fd,file = tempfile.mkstemp(suffix='.png', prefix='tmc_svg_')
         fn = smart_str(fn)
         svg = rsvg.Handle(file=fn)
 
@@ -123,7 +123,10 @@ class ComposedImage(object):
         surface.write_to_png(file)
         image = Image.open(file, "r")
         image.convert("RGBA")
+
+        os.close(fd)
         os.unlink(file)

Thanks,
  Sven

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.