Af-Server crashing / hardware performance increase

General discussions and questions.
Post Reply
Eberrippe
Posts: 4
Joined: Thu Aug 04, 2022 2:20 pm

Af-Server crashing / hardware performance increase

Post by Eberrippe »

Hello,

We built our whole pipeline in the afanasy environment. Super happy and it performs very good. But from time to time it crashes and needs to be restarted manually.
We have about 700 nodes and 5.000 jobs afanasy handles (99% of the time successfully). We are still on version 2.3.0 looking forward to updating to v3.3.0 soon.

We try to monitor the crashes but cant confirm that its related to some increased service uses. Mostly it crashes during afternoon or evening (which, I admit, sounds very likely like a load issue).

Watching our server in htop it mostly performs about 27 processes with one using 30-45% CPU. The other ones use about 3-10% CPU. Because of this, our servers hardware is primarily focused on single-core performance.

If the crashes happen, all 27 processes use 80-95% CPU load and all cores max-out to 99%. Then we have to restart the afserver.

Does anyone has experience with this kind of issue? How did you fix it?
What is the right hardware to running the af server in this kind of scope? How is multi-core performance compared to single-core performance?
Regarding the versions did the hardware usage or efficiency increase, meaning updating is our chance to fix?

Thanks a lot and I am super happy for any suggestions or ideas.

Eberrippe
Attachments
This is the Peak -> crash situiation
This is the Peak -> crash situiation
ScrenshotPeak.jpg (233.84 KiB) Viewed 3149 times
This is the standard situation
This is the standard situation
ScrenshotNormal.jpg (135.76 KiB) Viewed 3149 times
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Af-Server crashing / hardware performance increase

Post by timurhai »

Hi!

Unfortunately, we have about 70 nodes and 500 jobs (ten times lesser that you).
We have not been noticed that afserver needs some resources.

In version 3.3.0 one major issue has been solved: https://github.com/CGRU/cgru/issues/6
If you are using tasks depends mask and jobs with lots of blocks with lots of tasks, this can take lots of CPU resources to solve such job in earlier versions.

Also Web GUIs take much more resources not only on the client side, but on server side too.
Lots of artists using Web GUI can take much resources on the server too.
Better to use AfWatch.
You can perform some tests to find out GUIs or a tasks distribution take such resources.

ps
We run afserver on a virtual machine (like any servers) with 2400x4MHz CPU and 4GB RAM.
And never faced that hardware is low. But we 10 times smaller.

btw,
What are server logs during crashing?
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
Eberrippe
Posts: 4
Joined: Thu Aug 04, 2022 2:20 pm

Re: Af-Server crashing / hardware performance increase

Post by Eberrippe »

Thanks a lot for your answer Timur.

This really sounds like an update could be a big gamechanger. We have a lot of dependencies which need to be resolved.

I monitored the log regarding the error. Unfortunately, there isn't anything which helps. :(

Thanks, yeah we built our own viewer and limited the request periods.
Post Reply