Page 1 of 1

lfd segfault driving me nuts.

Posted: 13 Apr 2007, 04:05
by draknet
The segfault error I was seeing went away, and then it came back - I'm alerted after temp fills up with massive core dumps.

Code: Select all

Apr 12 19:00:29 squirtle kernel: lfd[25810]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 19:08:23 squirtle kernel: lfd[32002]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 19:11:33 squirtle kernel: lfd[1593]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 19:13:11 squirtle kernel: lfd[2550]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 19:49:51 squirtle kernel: lfd[23352]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 19:59:53 squirtle kernel: lfd[28890]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 20:06:46 squirtle kernel: lfd[701]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 20:11:31 squirtle kernel: lfd[3829]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 20:25:16 squirtle kernel: lfd[13209]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 21:05:58 squirtle kernel: lfd[7351]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 21:17:59 squirtle kernel: lfd[14390]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 4
Apr 12 21:40:44 squirtle kernel: lfd[28648]: segfault at 0000000000000038 rip 000000000042433d rsp 0000007fbffff300 error 

I see that it shut down:

Apr 12 18:45:08 squirtle lfd: lfd shutdown succeeded

and within 15 minutes, the segfaults started. I've checked the lfd logs for the corresponding times, but I'm not seeing any error or anything that it's doing that's out of the ordinary.

Anyone seen anything like this or have some advice where to look? I have CSF running on another identical server with the same kernal build, and I'm not seeing this problem on any of the other servers I have. Just this one.:confused:

Edited: Ok, I lied, all my other servers are running on 2.6.9-42.0.8.ELsmp, and this one's on 2.6.9-42.0.10.ELsmp - I think I'll be going back to .08, because I can't find the durn issue.

Posted: 13 Apr 2007, 11:00
by chirpy
I doubt it would be the kernel, but it's possible. Could be a glibc problem. Best thing to try would be to edit /etc/csf/lfd.pl and set $debug to 1 and then stop lfd:

service lfd stop

Then run lfd through strace:

strace -f /etc/csf/lfd.pl

You'll get a mass of information but hopefully when it terminates with a segfault it'll show the last activity that caused it.

Wasn't the kernal.

Posted: 13 Apr 2007, 13:50
by draknet
Changed the kernal, and still got a bunch of segfaults overnight. Bleh.

Running strace now - just my luck, it will "fix itself" again now that I'm recording. :D

Will report back when I have something for posterity - or help.

Posted: 13 Apr 2007, 16:44
by draknet
Well, that didn't help. The segfault didn't appear, and the log just got too massive to deal with. I reinstalled a few things, and I've set up a cron to clean out the core dumps since it doesn't seem to be affecting whether lfd works, and the dumps aren't impacting server performance. I did see one error where dirwatch was throttled, so I turned that off for now. But I'm still perplexed.

For now, it's running and not stopping the server by filling up tmp, but if anyone else has this issue and strikes on a solution, I'd love to hear from you. I'll keep looking but at the moment, I'm rather frustrated. :cool: