Michael Xavier | 14 Sep 21:20 2013
Picon

Quick Angel User's Survey

Hey Cafe,

I am the maintainer of Angel, the process monitoring daemon. Angel's job is to start a configured set of processes and restart them when they go away. I was responding to a ticket and realized that the correct functionality is not obvious in one case, so I figured I'd ask the stakeholders: people who use Angel. From what I know, most people who use Angel are Haskellers so this seemed like the place.

When Angel is terminated, it tries to cleanly shut down any processes it is monitoring. It also shuts down processes that it spawned when they are removed from the config and the config is reloaded via the HUP signal. It uses terminateProcess from System.Process which sends a SIGTERM to the program on *nix systems.

The trouble is that SIGTERM can be intercepted and a process can still fail to shut down. Currently Angel issues the SIGTERM and hopes for the best. It also cleans pidfiles if there were any, which may send a misleading message. There are a couple of routes I could take:

1. Leave it how it is. Leave it to the user to make sure stubborn processes go away. I don't like this solution so much as it makes Angel harder to reason about from a user's perspective.
2. Send a TERM signal then wait for a certain number of seconds, then send an uninterruptable signal like SIGKILL.

There are some caveats with #2. I think I'd prefer the timeout to be configurable per-process. I think I'd also prefer that if no timeout is specified, we assume the user does not want us to use a SIGKILL. SIGKILL can be very dangerous for some processes like databases. I want explicit user permission to do something like this. If Angel generated a pidfile for the process, if it should only be cleaned if Angel can confirm the process is dead. Otherwise they should be left so the user can handle it.

So the real question: is the extra burden of an optional configuration flag per process worth this feature? Are my assumptions about path #2 reasonable.

Thanks for your feedback!

--
Michael Xavier
http://www.michaelxavier.net
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alexander V Vershilov | 14 Sep 22:41 2013
Picon

Re: Quick Angel User's Survey

Hello, Michael.

I'm a potential angel user, and I'd like to add a possibility of optional angel usage as a
supervisor for openrc services, when I'll have time.

Common practise is:

send SIGTERM for a couple of times, then send SIGQUIT for a couple of times, then SIGKILL.
You will need to wait for some time between each actions. If your program is the parent of
a service than it's easy to wait for child death otherwise your need prctl PR_{SET,GET}_CHILD_SUBREAPER [1]
in order to correctly wait for service. Or sending 0 signal to check if process still alive, it's
non reliable but portable solution.
As a additional possible solution (may lead to a problems) it's possible to traverse service
tree and kill processes starting with leafs.

In any can overriding service kill functionality is vastly needed, as most of supervision systems
have a very limited approach to it.


On 14 September 2013 23:20, Michael Xavier <michael <at> michaelxavier.net> wrote:
Hey Cafe,

I am the maintainer of Angel, the process monitoring daemon. Angel's job is to start a configured set of processes and restart them when they go away. I was responding to a ticket and realized that the correct functionality is not obvious in one case, so I figured I'd ask the stakeholders: people who use Angel. From what I know, most people who use Angel are Haskellers so this seemed like the place.

When Angel is terminated, it tries to cleanly shut down any processes it is monitoring. It also shuts down processes that it spawned when they are removed from the config and the config is reloaded via the HUP signal. It uses terminateProcess from System.Process which sends a SIGTERM to the program on *nix systems.

The trouble is that SIGTERM can be intercepted and a process can still fail to shut down. Currently Angel issues the SIGTERM and hopes for the best. It also cleans pidfiles if there were any, which may send a misleading message. There are a couple of routes I could take:

1. Leave it how it is. Leave it to the user to make sure stubborn processes go away. I don't like this solution so much as it makes Angel harder to reason about from a user's perspective.
2. Send a TERM signal then wait for a certain number of seconds, then send an uninterruptable signal like SIGKILL.

There are some caveats with #2. I think I'd prefer the timeout to be configurable per-process. I think I'd also prefer that if no timeout is specified, we assume the user does not want us to use a SIGKILL. SIGKILL can be very dangerous for some processes like databases. I want explicit user permission to do something like this. If Angel generated a pidfile for the process, if it should only be cleaned if Angel can confirm the process is dead. Otherwise they should be left so the user can handle it.

So the real question: is the extra burden of an optional configuration flag per process worth this feature? Are my assumptions about path #2 reasonable.

Thanks for your feedback!

--
Michael Xavier
http://www.michaelxavier.net

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




--
Alexander
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alexander Kjeldaas | 14 Sep 23:43 2013
Picon

Re: Quick Angel User's Survey

You can use cgroups on linux to ensure that everything is shut down. See systemd.

Alexander

On Sep 14, 2013 9:21 PM, "Michael Xavier" <michael <at> michaelxavier.net> wrote:
Hey Cafe,

I am the maintainer of Angel, the process monitoring daemon. Angel's job is to start a configured set of processes and restart them when they go away. I was responding to a ticket and realized that the correct functionality is not obvious in one case, so I figured I'd ask the stakeholders: people who use Angel. From what I know, most people who use Angel are Haskellers so this seemed like the place.

When Angel is terminated, it tries to cleanly shut down any processes it is monitoring. It also shuts down processes that it spawned when they are removed from the config and the config is reloaded via the HUP signal. It uses terminateProcess from System.Process which sends a SIGTERM to the program on *nix systems.

The trouble is that SIGTERM can be intercepted and a process can still fail to shut down. Currently Angel issues the SIGTERM and hopes for the best. It also cleans pidfiles if there were any, which may send a misleading message. There are a couple of routes I could take:

1. Leave it how it is. Leave it to the user to make sure stubborn processes go away. I don't like this solution so much as it makes Angel harder to reason about from a user's perspective.
2. Send a TERM signal then wait for a certain number of seconds, then send an uninterruptable signal like SIGKILL.

There are some caveats with #2. I think I'd prefer the timeout to be configurable per-process. I think I'd also prefer that if no timeout is specified, we assume the user does not want us to use a SIGKILL. SIGKILL can be very dangerous for some processes like databases. I want explicit user permission to do something like this. If Angel generated a pidfile for the process, if it should only be cleaned if Angel can confirm the process is dead. Otherwise they should be left so the user can handle it.

So the real question: is the extra burden of an optional configuration flag per process worth this feature? Are my assumptions about path #2 reasonable.

Thanks for your feedback!

--
Michael Xavier
http://www.michaelxavier.net

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane