Distributed simulation (Testing the new option)

Post Reply
sergeneren
Posts: 3
Joined: Wed Apr 19, 2017 8:13 pm

Distributed simulation (Testing the new option)

Post by sergeneren »

Hi everyone,

I was the one suggested that afanasy should have houdini distributed simulation features and Timur was kind enough to show interest in my venturesome interests :D . And not so long ago this feature was implemented in cgru.

Following ramblings show my attempts at debugging, testing and attempts at trying to fix the bugs.

First thing i did was to download and unpack the latest distro of cgru (2.2.1) and setup my own parameters. After installation i made a Houdini file for sim distribution (a very simple file with shelf tools). you can find the file here https://www.dropbox.com/s/92vo0xnb9xuso ... hiplc?dl=0

The created HQsim node contains all the info i need so i copy the same values for output driver, control node etc and submit. And voila first error!!!

Image

It basically says System cant find procedure (Sorry my system is native language). So i track down and find the error was in htracker.cmd in "D:\cgru-windows\software_setup\bin".

First of all i dont have %HFS% envar set anywhere and %HFS%\python\bin\python2.7 doesn't direct to correct path. So i use the previously defined %HOUDINI_LOCATION% variable and correct the htrack.cmd as shown

Code: Select all

rem set "PYTHONPATH=%HFS%\houdini\python2.7libs;%PYTHONPATH%"
set "PYTHONPATH=%HOUDINI_LOCATION%\houdini\python2.7libs;%PYTHONPATH%"

rem "%HFS%\python\bin\python2.7" "%HOUDINI_CGRU_PATH%\htracker.py" %*
"%HOUDINI_LOCATION%\bin\hython2.7" "%HOUDINI_CGRU_PATH%\htracker.py" %*
(i used hython because i have gone through some problems importing _hou module)

Now when i submit the job, i can see the tracker working (actually it doesnt verbose but i assume it is working) and lacking the verbose i change the simtracker.setVerbosity(False) to simtracker.setVerbosity(True) in D:\cgru-windows\plugins\houdini\htracker.py

I double click the job and one of the slices and i can see that the tracker has a name (actually my computer in this case because no other pc is connected right now) and a port address. if i go to said address i can see that tracker is serving the slices.

Image

Image

(Note that i assumed the webport is -1 of port and i was right)

But there is a problem, this job goes on for hours and nothing happens! So i start to wonder if i can simulate the distribution on my own machine without afanasy and see if it is actually working. I know for the fact that one can start the simtracker and append slice jobs to it.

For manual testing purposes i start 3 command lines and enter the following batch commands to each:

For the tracker cmd i enter:

Code: Select all

cd "c:\Program Files\Side Effects Software\Houdini 16.0.557\bin"
hython2.7.exe "c:\Program Files\Side Effects Software\Houdini 16.0.557\houdini\python2.7libs\simtracker.py" 8000 9000
and for the slicer commands i enter the following to each command line

Code: Select all

rem this is for first command line tool
cd "c:\Program Files\Side Effects Software\Houdini 16.0.557\bin"
hbatch -c "setenv SLICE=0; render /obj/distribute_pyro/save_slices;quit;" D:TEMP\dist_test.hiplc

rem this is for second command line tool
cd "c:\Program Files\Side Effects Software\Houdini 16.0.557\bin"
hbatch -c "setenv SLICE=1; render /obj/distribute_pyro/save_slices;quit;" D:TEMP\dist_test.hiplc
and i can see the simulation started and bgeo files start coming out (note that you should change the hiplc file location)

Image
Image

(Quick note; your output folder should be absolute, i dont know why hbatch cannot find $HIP !!)

Back to afanasy, i try again with manual tracker selection and set the control node and afanasy node to "ofis-pc:8000"; and when i observe the tracker page (and af page ofc) i see that sliciers are assigned to manual tracker but still nothing outputs files (all slicing renderers are assigned as peer #0 for some reason??)

Image

This is as far as i have come and yet to find the fix for tracker problem when sent with afanasy. But i strongly suspect that when the job is sent with afanasy you can clearly see on tracking page that all the slices are appended as peer #0. (i have also tried this with more then one pc and result is the same, it doesnt matter different machines gets assigned they always seem as peer 0). And you can see that on the output tab the machine echo as "duplicate peer 0". (this is spitted by simtracker). I dont know why seperate renderers are assigned as "peer 0" and i am yet to find the cause.


Thank you for reading so far. I will update this entry as soon as i find the solution.
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Distributed simulation (Testing the new option)

Post by timurhai »

Hi.
Thank you for testing on MS Windows.
I used it on Linux only.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
sergeneren
Posts: 3
Joined: Wed Apr 19, 2017 8:13 pm

Re: Distributed simulation (Testing the new option)

Post by sergeneren »

timurhai wrote: Thu Apr 20, 2017 10:32 am Hi.
Thank you for testing on MS Windows.
I used it on Linux only.
Hi Timur, thanks for the implementation, btw when i send the job i dont see any hbatch's working in the background as when i did manually. Who parses a command like "hbatch setenv SLICE 0" or something like this?
sergeneren
Posts: 3
Joined: Wed Apr 19, 2017 8:13 pm

Re: Distributed simulation (Testing the new option)

Post by sergeneren »

Ok got it working now :D

had to modify hrender_af.py to disable slice parm setting and instead creating the $SLICE env variable. I will send a pull request tonight.
Post Reply