tasks failing to stop

General discussions and questions.
keyframe
Posts: 43
Joined: Sat Jan 21, 2017 9:43 pm

tasks failing to stop

Post by keyframe » Sun Jan 19, 2020 4:51 pm

I'm hoping for some clues on how to debug a problem I've recently started having.

Since I upgraded to 2.3.1 (from 2.2.3) and centos8 from centos7, I've run into situation where deleting a job via the web interface appears to delete the job, however the process is still running on the afrender client.

Has anyone seen this? Any thoughts on how to debug?

G
--
centOS 8.1, cgru 2.3.1

User avatar
timurhai
Site Admin
Posts: 580
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: tasks failing to stop

Post by timurhai » Mon Jan 20, 2020 9:42 am

Hi.
Try to look at afserver and afrender processes output.
Timur Hairulin
CGRU 2.4.0, Ubuntu 18.04 LTS, MS Windows 10 (clients only).

keyframe
Posts: 43
Joined: Sat Jan 21, 2017 9:43 pm

Re: tasks failing to stop

Post by keyframe » Mon Jan 20, 2020 3:44 pm

Where are those logged?
--
centOS 8.1, cgru 2.3.1

User avatar
timurhai
Site Admin
Posts: 580
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: jobs failing to delete

Post by timurhai » Mon Jan 20, 2020 4:25 pm

If you installed Linux packages, it is SystemD.

Sometimes for an error catching is more easy to launch afserver and afrender manually in terminal.
Such way you can watch logs real-time.
Timur Hairulin
CGRU 2.4.0, Ubuntu 18.04 LTS, MS Windows 10 (clients only).

keyframe
Posts: 43
Joined: Sat Jan 21, 2017 9:43 pm

Re: tasks failing to stop

Post by keyframe » Mon Jan 20, 2020 5:03 pm

Heya Timur,

Searching through journalctl entries on client:

Jan 20 12:00:45 tws12 _afrender.sh[3788]: INFO Finished PID=12369: Exit Code=15 Status=0 (stopped)
Jan 20 12:00:45 tws12 _afrender.sh[3788]: INFO Task terminated/killed by signal: 'Terminated'

on the server:

Jan 20 12:00:19 tsr02 _afserver.sh[2373]: Mon 20 Jan 12:00.19: Job registered: "JOBSNAMEHERE"[6]: gene@tws12[1] - 33292 bytes.
Jan 20 12:00:45 tsr02 _afserver.sh[2373]: Mon 20 Jan 12:00.45: Deleting a job: "JOBNAMEHERE"[6]: gene@tws12[1] - 33601 bytes.
Jan 20 12:00:45 tsr02 _afserver.sh[2373]: ERROR Mon 20 Jan 12:00.45: AFCommon::writeFile: /var/tmp/afanasy/jobs/0/6.JOBNAMEHERE/data.json.tmp
Jan 20 12:00:45 tsr02 _afserver.sh[2373]: No such file or directory

However, the process is very much alive, and busy generating frames as it was before.

G
--
centOS 8.1, cgru 2.3.1

keyframe
Posts: 43
Joined: Sat Jan 21, 2017 9:43 pm

Re: tasks failing to stop

Post by keyframe » Mon Jan 20, 2020 5:07 pm

Here's something interesting.

The PID of the task running on tws12 is 12377, NOT 12369 like the afrender log seems to suggest.

G
--
centOS 8.1, cgru 2.3.1

User avatar
timurhai
Site Admin
Posts: 580
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: tasks failing to stop

Post by timurhai » Mon Jan 20, 2020 5:23 pm

The issue is not "jobs failing to delete", but "child tasks failed to stop".
You can test it not just on a job deletion, but on any stop, skip, retry, the result should be the same.

Afrender run some command, that command can raise several child processes.
So to terminate/kill all of them, it sets a new session just before main task process start (before it can raises any child process) setsid()
https://github.com/CGRU/cgru/blob/maste ... ss.cpp#L35
https://www.google.com/search?q=man+setsid
Later, to send signal to all processes afrender uses killpg( getpgid()):
https://github.com/CGRU/cgru/blob/maste ... s.cpp#L622

And works on most common cases.
You can try to run/stop you task in some other way (manually in terminal) to find out how it should be stopped properly.
Also you can fin out some command flags, may be there is an option to create new processes group (session) or not.
Timur Hairulin
CGRU 2.4.0, Ubuntu 18.04 LTS, MS Windows 10 (clients only).

keyframe
Posts: 43
Joined: Sat Jan 21, 2017 9:43 pm

Re: tasks failing to stop

Post by keyframe » Mon Jan 20, 2020 6:43 pm

Thanks for the insight. I'll debug further.

I wonder whether this is related to me switching users as part of the task.

I've added a su - <submitting_user_name> - c "<command to execute>" to the command in order to the resulting frames to be owned by the same person that submitted the render, rather then the user that runs the afrender daemon.

What's really puzzling me though, is it that it only does this some time. Most of the time, the child processes are stopping as expected.

G
--
centOS 8.1, cgru 2.3.1

keyframe
Posts: 43
Joined: Sat Jan 21, 2017 9:43 pm

Re: tasks failing to stop

Post by keyframe » Mon Jan 20, 2020 9:10 pm

The process tree looks like this, and I've noticed that su behaves a little differently between centos 7 and 8 regarding permissions -- perhaps there's more going on there that i'm unaware of.

Code: Select all

systemd(1)─┬─ModemManager(1097)─┬─{ModemManager}(1149)
           │                    └─{ModemManager}(1165)
           ├─NetworkManager(1123)─┬─{NetworkManager}(1161)
           │                      └─{NetworkManager}(1169)
           ├─accounts-daemon(1452)─┬─{accounts-daemon}(1454)
           │                       └─{accounts-daemon}(1457)
           ├─afrender(1968)───su(21906)───bash(21907)───hython-bin(21914)─┬─{hython-bin}(21924)
           │                                                              ├─{hython-bin}(21926)
           │                                                              ├─{hython-bin}(21927)
           │                                                              ├─{hython-bin}(21928)
           │                                                              ├─{hython-bin}(21929)
--
centOS 8.1, cgru 2.3.1

User avatar
timurhai
Site Admin
Posts: 580
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: tasks failing to stop

Post by timurhai » Tue Jan 21, 2020 12:21 pm

I think that after su - new session and process group created. And afrender method to stop all child tasks is not working.
If you wrote a command wrapper using "su -", may be there is a way to stop all childs by that wrapper too.
Timur Hairulin
CGRU 2.4.0, Ubuntu 18.04 LTS, MS Windows 10 (clients only).

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests