2,000,000+ frames later:

General discussions and questions.
Post Reply
avclubvids
Posts: 22
Joined: Fri Jul 21, 2017 9:17 pm

2,000,000+ frames later:

Post by avclubvids »

We've been absolutely hammering CGRU since June with 2k and 4k stereo 360º renders from C4D, Octane, Houdini, Fusion, and After Effects, and I have a few observations on the things that were our largest pain points, as a way of helping to identify the areas in which CGRU could use the most improvement in order to make it even more robust. First and foremost, yes, we actually did render over 2million frames, so we're obviously impressed and happy. These are not complaints but rather requests to make the next jobs even better. So here are the main pain points for us so far:

1) Not being able to change the frame range of a submitted job. I know it's on the Roadmap but wanted to urge this item to the top, it's the single biggest time waster of all. When a long task fails there are two options right now: re-render all the frames (wasteful) or submit a new job for just the missing frames (takes time to resubmit). There are many, many times that changing the number of frames per task would have helped optimize available render resources and saved hours of time.

2) Skip existing. CGRU needs a smarter way of knowing if a frame already exists so it can skip re-rendering it.This comes up all the time when patching bad or missing frames in a render. Currently, we have to submit multiple smaller jobs to fill in missing frames. This issue is tied to issue #1 somewhat, in that changing task frame range often leads to re-rendering frames unnecessarily.

3) UI / UX issues. We had a lot of trouble getting artists up to speed on CGRU. The main issues we ran into repeatedly were:
a) The web GUI is far more user-friendly than Watch, but lacks a lot of the controls that Watch has. So artists use the web GUI and complain about missing features that are actually already in Watch.
b) Watch does not work on a 4k monitor, the GUI is incredibly small and does not scale
c) Using regex for whitelist/blacklist and job dependencies is a next to impossible ask for most artists
d) To use CGRU you basically have to live in GodMode. We just told everyone the password and told them to use it every time

4) Lack of NIMBY scheduling in the Watch/Web GUI means that setting a machine to NIMBY for X amount of time falls on the artist, and the NIMBY scheduler is less of a quick "pause me for 4 hours while this local task runs" and more of a weekly schedule kind of thing. We did a lot of checking machines remotely and manually un-NIMBY-ing them. It would also be great to have a "NIMBY until [CONDITION]" like CPU or RAM usage drops below a threshold etc.

5) CGRU is too fair. The default behavior wherein CGRU tries to make sure that each user gets a portion of the farm is just not in line with the way real productions work. It is never about who submitted, or when, it is always about what needs to get rendered first. We needed a better way to force priority so that we can manually push through important renders and then let CGRU automatically chew through the rest when there is more time. I cannot count the nights that we have carefully paused, added wait timers, and checked in on the queue to make sure that things were getting rendered in the order we needed them.

On top of these really big issues, there are a bunch of little things that we'd love to see added, like better integration of the dailies system for automated dailies, GUI customization, more options in the submission scripts, and much more... but the above are the important ones that will have the biggest impact for everyone.

In closing, we have really leaned heavily on CGRU, and it stood up to a massive challenge and never really let us down. Thank you, Timur for bringing CGRU to life! We'll push some updates to the repo of the scripts we had to update along the way and we'll keep the feedback coming your way.
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: 2,000,000+ frames later:

Post by timurhai »

Hi.
Thank You!
I also have lots of issues for this project and lots of plans.
You can to contribute too, it will be great!

4) This is a Nimby schedule in the CGRU Keeper. Also you can use auto-nimby specifying busy CPU, disk and so on:
http://cgru.info/afanasy/server#farm_nimby_idle_cpu

5) Human, that understands what the farm is calculating right now, will be always more smart that any program )
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
Post Reply