tasks failing to stop

General discussions and questions.
keyframe
Posts: 62
Joined: Sat Jan 21, 2017 9:43 pm
Location: Toronto

Re: tasks failing to stop

Post by keyframe »

I will investigate this option further, but i'm not optimistic (unless the change occured between centos7 and centos8) since I've been doing this for a few years now, and the issue only surfaced recently...

... of course, I was warned about upgrade the OS, python2.7 -> python3, and Afanasy at the same time ... but you can't stand in the way of progress ;P

G
--
Rocky Linux 8.5, cgru 3.2.1
keyframe
Posts: 62
Joined: Sat Jan 21, 2017 9:43 pm
Location: Toronto

Re: tasks failing to stop

Post by keyframe »

Heya Timur,

Investigating this further, here's what I turned up so far:

Code: Select all

PPID   PID   PGID
 5015 19530 19530 19530 ?           -1 SNs      0   0:00 su -m keyframe -c umask 022;redshiftCmdLine /dump/.farm/my_rs.1064.rs
19530 19531 19531 19531 ?           -1 SNs   1000   0:00 bash -c umask 022;redshiftCmdLine /dump/.farm/my_rs.1064.rs
19531 19538 19531 19531 ?           -1 SNl   1000 204:27 redshiftCmdLine /dump/.farm/my_rs.1064.rs
Which seems to suggest that the 'su' command does in fact create a new PGID -- which helps explain why the child processes are failing to terminate.
Unfortunately, I no longer have any centos7 machines to test on, but this DID work, which leads me to believe that something changed in that respect between centos versions.

Is there anything I can do about this?

G
--
Rocky Linux 8.5, cgru 3.2.1
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: tasks failing to stop

Post by timurhai »

Hi!

May be there is some other ways to switch user.

You can try to create some smarter command wrapper. This wrapper can find out new pgid, catch sigterm signal to kill new pgid. I think that it can be done on Python (or even bash :) if you are brave).

Also, you always can to create a virtual machine with any Linux distributive. And if you create a virtual CentOS7 for test, it can help you to find out the difference.

May be you can some you problem w/o switching user at all.

--------------

Afanasy sends SIGTERM first, it can be catched by process, and let it to finish in some custom way (perform cleanup).

https://github.com/CGRU/cgru/blob/maste ... asy.h#L128

Then, if process is still running, it kills it (SIGKILL is not catcheable).

For example, if you stop dailies task, its Python script will remove temporary folder:
https://github.com/CGRU/cgru/blob/maste ... vie.py#L31
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
keyframe
Posts: 62
Joined: Sat Jan 21, 2017 9:43 pm
Location: Toronto

Re: tasks failing to stop

Post by keyframe »

The plot thickens!

I found a flame machine in the building that was still running centos7, and added it to the afanasy pool. I was surprised to see it behaving the same way (the child processes fails to terminate), which suggests that whatever changed was not related to the centos7->8 change, but between cgru2.2.3, and cgru2.3.1. maybe.

I also launched the command directly as 'afrender' user, instead of launching an 'su' subprocess, and THOSE jobs terminate as expected, but the unwanted side effect is that the files are now owned by 'afrender' user, and I don't know who to blame when the renders/comps look terrible. Aside from blame, there are other more practical pipeline reasons for users to retain ownership of their renders -- other users being able to overwrite them is one big one.

for completeness sake, I also tried using 'runser' instead of 'su', which also did not work as intended.

I will keep digging. In the meantime is there anything you can suggest that would have changed between version 2.2.3, to 2.3.1 that could also set this off?

Best,

G
--
Rocky Linux 8.5, cgru 3.2.1
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: tasks failing to stop

Post by timurhai »

The way to start that task command and the way to stop/kill task process was not changed for a very long time.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
keyframe
Posts: 62
Joined: Sat Jan 21, 2017 9:43 pm
Location: Toronto

Re: tasks failing to stop

Post by keyframe »

For posterity, It's the 'umask 022' portion of the command string. For some reason when I do that, the process spawned AFTER the ';' does not terminate as it should.

What does seem to work is 'umask 022 && hou2rsp parms here' instead of 'umask 022; hou2rsp parms here'

Phew.

@Timur, thank you for helping me track this down.

G
--
Rocky Linux 8.5, cgru 3.2.1
Post Reply