Many "lfd - (child) Statistic" processes
-
- Moderator
- Posts: 1524
- Joined: 01 Oct 2008, 09:24
Re: Many "lfd - (child) Statistic" processes
I'll add a timeout to the iptables_log processing routine. I do not know why it is happening, but it can only come about due to a problem writing to /var/lib/csf/stats/iptables_log
Re: Many "lfd - (child) Statistic" processes
Hi Jonathan. I bet you are right and it's something I hadn't thought about... that /var may have become full. I'm looking into this. Thank you!!!
Re: Many "lfd - (child) Statistic" processes
Hello, I have been experiencing the same issue:
1) Version: csf: v7.57 (generic)
2) Running on Debian Wheezy, with encrypted filesystem (LUKS). It appears that Debian Wheezy + OpenVPN is a toxic combination for this issue for us. (Even if the instances are just sitting idle!)
3) Once this combination was put together on two small VPS instances (different providers, different hardware, different locations etc), this starts to happen every few hours and the instances exhibit extremely high load and numerous lfd - child processes and the console displays numerous "csf kill process child anon-rss." At this point, it is painfully slow, thrashing RAM, and unusable.
There is also "lfd - resolving" and about 3 "kworker" processes using higher than normal load when this happens.
4) On a RARE occasion, I have been able to login (takes 5 - 10 minutes) to the instance and stop LFD ("/etc/init.d/lfd stop") and *immediately* everything returns to normal and the system is usable again. However, most times a reboot via the VPS control panel is the only resolution.
5) Both instances have plenty of RAM for their intended purpose (256MB system RAM with typical usage of 60MB according to "free -m" and python "ps_mem").
6) Both instances have at least 1GB of free disk (one instance has 17GB of free disk). Both have 0 SWAP being used.
I have never seen this behavior on Debian Squeeze (in fact, I am contemplating duplicating these instances in Squeeze to definitively confirm). I have at least 10 other similar instances running Squeeze + LUKS (but not OpenVPN) that never have any issues.
Any feedback would be helpful.
Thank you
(And thanks for a GREAT product!)
EDIT: If you need more detailed config info, please let me know.
And one of the instances (idle) just crashed from this as I wrote this post.
1) Version: csf: v7.57 (generic)
2) Running on Debian Wheezy, with encrypted filesystem (LUKS). It appears that Debian Wheezy + OpenVPN is a toxic combination for this issue for us. (Even if the instances are just sitting idle!)
3) Once this combination was put together on two small VPS instances (different providers, different hardware, different locations etc), this starts to happen every few hours and the instances exhibit extremely high load and numerous lfd - child processes and the console displays numerous "csf kill process child anon-rss." At this point, it is painfully slow, thrashing RAM, and unusable.
There is also "lfd - resolving" and about 3 "kworker" processes using higher than normal load when this happens.
4) On a RARE occasion, I have been able to login (takes 5 - 10 minutes) to the instance and stop LFD ("/etc/init.d/lfd stop") and *immediately* everything returns to normal and the system is usable again. However, most times a reboot via the VPS control panel is the only resolution.
5) Both instances have plenty of RAM for their intended purpose (256MB system RAM with typical usage of 60MB according to "free -m" and python "ps_mem").
6) Both instances have at least 1GB of free disk (one instance has 17GB of free disk). Both have 0 SWAP being used.
I have never seen this behavior on Debian Squeeze (in fact, I am contemplating duplicating these instances in Squeeze to definitively confirm). I have at least 10 other similar instances running Squeeze + LUKS (but not OpenVPN) that never have any issues.
Any feedback would be helpful.
Thank you
(And thanks for a GREAT product!)
EDIT: If you need more detailed config info, please let me know.
And one of the instances (idle) just crashed from this as I wrote this post.
-
- Moderator
- Posts: 1524
- Joined: 01 Oct 2008, 09:24
Re: Many "lfd - (child) Statistic" processes
What you are describing sounds very different in that it could only really happen if some resource is being depleted. There is not much I can suggest other than:
1. Set the DEBUG option in csf.conf to 1 (you can increase it up to 4 but it will create a huge log file if you do) and keep a close eye on lfd.log for what is happening. You should obviously check lfd.log for the time you are seeing issues to see if anything is apparent
2. Using lsof and strace on some of the processes may help identify the cause of the issue, though it may be difficult on a slow running system
3. Ensure that the root account has no ulimits set
4. If you are using Virtuozzo/OpenVZ then all bets are off as it can be such a nightmare of a system to manage and we would strongly recommend any thing else (e.g. Xen, KVM, etc)
5. Only run csf with the default settings incase any alterations you make are causing timing conflicts
6. Ensure any monitored logs are not being flooded and overloading lfd
Edit: 7. The resolver processes might also suggest slow to respond nameservers - ensure those are working correctly and quickly in /etc/resolv.conf
1. Set the DEBUG option in csf.conf to 1 (you can increase it up to 4 but it will create a huge log file if you do) and keep a close eye on lfd.log for what is happening. You should obviously check lfd.log for the time you are seeing issues to see if anything is apparent
2. Using lsof and strace on some of the processes may help identify the cause of the issue, though it may be difficult on a slow running system
3. Ensure that the root account has no ulimits set
4. If you are using Virtuozzo/OpenVZ then all bets are off as it can be such a nightmare of a system to manage and we would strongly recommend any thing else (e.g. Xen, KVM, etc)
5. Only run csf with the default settings incase any alterations you make are causing timing conflicts
6. Ensure any monitored logs are not being flooded and overloading lfd
Edit: 7. The resolver processes might also suggest slow to respond nameservers - ensure those are working correctly and quickly in /etc/resolv.conf
Re: Many "lfd - (child) Statistic" processes
Hello ForumAdmin, thank you for your reply. My response below (preceded by "--"):
-- OK, will try this.
2. Using lsof and strace on some of the processes may help identify the cause of the issue, though it may be difficult on a slow running system
-- Will do this as well.
3. Ensure that the root account has no ulimits set
-- Does not have ulimit set ("unlimited"). I also verified no ulimit in:
/root/.profile
/etc/profile
/etc/pam.d/sshd
/etc/pam.d/su
/etc/rc.local
(I have about 40 instances running Squeeze that *do* have ulimit for fail2ban and their CSF has *never* exhibited problem)
4. If you are using Virtuozzo/OpenVZ then all bets are off as it can be such a nightmare of a system to manage and we would strongly recommend any thing else (e.g. Xen, KVM, etc)
-- Issue occurring on KVM / Wheezy instances. Most instances (40+) are running KVM. Only have 3 OpenVZ instances (running Squeeze) not being reported here because .... they *never* exhibited the problem.
5. Only run csf with the default settings incase any alterations you make are causing timing conflicts
-- Been trying to track down the issue by changing *one* config item at a time between default config and my config to see when it occurs. Yes, this is tedious and time consuming.
6. Ensure any monitored logs are not being flooded and overloading lfd
-- I don't believe this is occurring. BUT, Wheezy does log SSH Login failures a bit differently (i.e., "Bye Bye [preauth]") and this was not included in the default sshd.conf / ssh-ddos.conf of fail2ban. Therefore, quite a few of these show up in the logs with each unauthorized SSH Login attempt before being nullrouted or blocked by fail2ban. I have added a new regex to tighten up the blocking of attempts logged this way. (These instances only permit public key authentication + do not permit root login + need to have SSH port open for their intended purpose. Again, no issues whatsoever on Debian Squeeze instances.)
Edit: 7. The resolver processes might also suggest slow to respond nameservers - ensure those are working correctly and quickly in /etc/resolv.conf
-- Appear to be working fine (i.e., query time as low as 1 msec; nominal 50 msec). Timed them with "dig" + tried own DNS resolvers + installed "unbound" on each instance to do resolution from localhost to attempt to rule out DNS timing/latency issues.
Again, thank you for your quick reply. I will try some of your suggestions above as well.
1. Set the DEBUG option in csf.conf to 1 (you can increase it up to 4 but it will create a huge log file if you do) and keep a close eye on lfd.log for what is happening. You should obviously check lfd.log for the time you are seeing issues to see if anything is apparentForumAdmin wrote:What you are describing sounds very different in that it could only really happen if some resource is being depleted. There is not much I can suggest other than:
-- OK, will try this.
2. Using lsof and strace on some of the processes may help identify the cause of the issue, though it may be difficult on a slow running system
-- Will do this as well.
3. Ensure that the root account has no ulimits set
-- Does not have ulimit set ("unlimited"). I also verified no ulimit in:
/root/.profile
/etc/profile
/etc/pam.d/sshd
/etc/pam.d/su
/etc/rc.local
(I have about 40 instances running Squeeze that *do* have ulimit for fail2ban and their CSF has *never* exhibited problem)
4. If you are using Virtuozzo/OpenVZ then all bets are off as it can be such a nightmare of a system to manage and we would strongly recommend any thing else (e.g. Xen, KVM, etc)
-- Issue occurring on KVM / Wheezy instances. Most instances (40+) are running KVM. Only have 3 OpenVZ instances (running Squeeze) not being reported here because .... they *never* exhibited the problem.
5. Only run csf with the default settings incase any alterations you make are causing timing conflicts
-- Been trying to track down the issue by changing *one* config item at a time between default config and my config to see when it occurs. Yes, this is tedious and time consuming.
6. Ensure any monitored logs are not being flooded and overloading lfd
-- I don't believe this is occurring. BUT, Wheezy does log SSH Login failures a bit differently (i.e., "Bye Bye [preauth]") and this was not included in the default sshd.conf / ssh-ddos.conf of fail2ban. Therefore, quite a few of these show up in the logs with each unauthorized SSH Login attempt before being nullrouted or blocked by fail2ban. I have added a new regex to tighten up the blocking of attempts logged this way. (These instances only permit public key authentication + do not permit root login + need to have SSH port open for their intended purpose. Again, no issues whatsoever on Debian Squeeze instances.)
Edit: 7. The resolver processes might also suggest slow to respond nameservers - ensure those are working correctly and quickly in /etc/resolv.conf
-- Appear to be working fine (i.e., query time as low as 1 msec; nominal 50 msec). Timed them with "dig" + tried own DNS resolvers + installed "unbound" on each instance to do resolution from localhost to attempt to rule out DNS timing/latency issues.
Again, thank you for your quick reply. I will try some of your suggestions above as well.