Page 1 of 1

Af-Server crashing / hardware performance increase

Posted: Thu Aug 04, 2022 2:50 pm
by Eberrippe
Hello,

We built our whole pipeline in the afanasy environment. Super happy and it performs very good. But from time to time it crashes and needs to be restarted manually.
We have about 700 nodes and 5.000 jobs afanasy handles (99% of the time successfully). We are still on version 2.3.0 looking forward to updating to v3.3.0 soon.

We try to monitor the crashes but cant confirm that its related to some increased service uses. Mostly it crashes during afternoon or evening (which, I admit, sounds very likely like a load issue).

Watching our server in htop it mostly performs about 27 processes with one using 30-45% CPU. The other ones use about 3-10% CPU. Because of this, our servers hardware is primarily focused on single-core performance.

If the crashes happen, all 27 processes use 80-95% CPU load and all cores max-out to 99%. Then we have to restart the afserver.

Does anyone has experience with this kind of issue? How did you fix it?
What is the right hardware to running the af server in this kind of scope? How is multi-core performance compared to single-core performance?
Regarding the versions did the hardware usage or efficiency increase, meaning updating is our chance to fix?

Thanks a lot and I am super happy for any suggestions or ideas.

Eberrippe

Re: Af-Server crashing / hardware performance increase

Posted: Thu Aug 04, 2022 5:02 pm
by timurhai
Hi!

Unfortunately, we have about 70 nodes and 500 jobs (ten times lesser that you).
We have not been noticed that afserver needs some resources.

In version 3.3.0 one major issue has been solved: https://github.com/CGRU/cgru/issues/6
If you are using tasks depends mask and jobs with lots of blocks with lots of tasks, this can take lots of CPU resources to solve such job in earlier versions.

Also Web GUIs take much more resources not only on the client side, but on server side too.
Lots of artists using Web GUI can take much resources on the server too.
Better to use AfWatch.
You can perform some tests to find out GUIs or a tasks distribution take such resources.

ps
We run afserver on a virtual machine (like any servers) with 2400x4MHz CPU and 4GB RAM.
And never faced that hardware is low. But we 10 times smaller.

btw,
What are server logs during crashing?

Re: Af-Server crashing / hardware performance increase

Posted: Fri Aug 12, 2022 10:18 am
by Eberrippe
Thanks a lot for your answer Timur.

This really sounds like an update could be a big gamechanger. We have a lot of dependencies which need to be resolved.

I monitored the log regarding the error. Unfortunately, there isn't anything which helps. :(

Thanks, yeah we built our own viewer and limited the request periods.