Afserver crashes

General discussions and questions.
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

or maybe you can give me a good hint on how to properly attach valgrind so we have more debug info.
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Afserver crashes

Post by timurhai »

Hi.
As it crashes on a new thread start, i can say the exact SIGSEGV point in the code:
https://github.com/CGRU/cgru/blob/maste ... d.cpp#L255

But unfortunately for now i can't say why we have some probability to hung there.
And why this probability is differ on different platforms.

Pthread library uses system "clone" call.
For now i do not know what we need to know more and what more can valgrind say, but may be it can help more.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Afserver crashes

Post by timurhai »

Hi.

I just wrote (and committed) threads raising test:
https://github.com/CGRU/cgru/blob/maste ... st.cpp#L41
It is an afcmd command that do almost the same as afserver on each new connection.

Strange results:

On my laptop (ubuntu16 gcc 5.4) i can raise 1 000 000 threads. Tried many times - no crashes.
But afserver crashes on this platform on when i spawn 200 afrenders.

At work (ubuntu14 gcc4.8) this test can crash staring from 1000 threads and definitely crash on 100 000 threads.
But i can`t crash afserver on this platform, including crash-tests with 200 afrenders.

I should read about pthread library more. Likely that afanasy uses it some incorrectly.
( note that DlThread class was just taken from 3delight, that is not a server, but render engine )

You are welcome to test this command on your platforms.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

I am not able to get it to crash ....
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
seven11
Posts: 20
Joined: Tue Feb 21, 2017 8:03 pm

Re: Afserver crashes

Post by seven11 »

@selsner,
Are you on a 32 or 64 bit kernel running the afserver?

Timur,
Why is the stack size for each thread so large? I have been testing an afserver with the thread stack size set to 32768 which reduces the virtual memory usage. I couldn't find where
the stack size was being set so I add this pthread_attr_setstacksize(&attr, 32768); to dlThread.cpp

Have a look at the first answer of this post:
http://stackoverflow.com/questions/5585 ... -of-memory

Scott
seven11
Posts: 20
Joined: Tue Feb 21, 2017 8:03 pm

Re: Afserver crashes

Post by seven11 »

So on my afserver machine each pthread stack uses 10485760 of virtual memory on a stock 2.2.1 afserver.

Here's the patch code if someone would like to try it:

Code: Select all

--- dlThread.cpp	2017-03-15 12:09:05.300428237 -0700
+++ dlThread.cpp.mod	2017-03-15 12:08:45.772250461 -0700
@@ -23,6 +23,7 @@
 /* The following chunk are for GetNbProcessors. */
 #ifdef LINUX
 #include	<sys/sysinfo.h>
+#include <stdio.h>
 #endif
 
 #ifdef IRIX
@@ -246,6 +247,12 @@
 #else
 	pthread_attr_t attr;
 	pthread_attr_init( &attr );
+	pthread_attr_setstacksize(&attr, 32768);
+
+//	size_t size;
+//	int ret = pthread_attr_getstacksize(&attr, &size);
+//	printf ( "Get: ret=%d,size=%u\n" , ret , size ) ;
+
 	if( detached )
 	{
 		pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );

Scott
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Afserver crashes

Post by timurhai »

May latest post and commit has a mistake.
DlThread does not check whether the thread was started.
And does return void from Start() function.
We can't raise 1 000 000 threads at the same time.
That test just stops to raise threads at all w/o any error message.
Normally on Linux you can't spawn more than 32k threads, and this is more than enough for us.
Now changed DlThread to return an integer that pthread_create returns.
And added a check in afserver and afcmd test.
https://github.com/CGRU/cgru/commit/583 ... c985e005b3

Also 1ms sleep was added in afcmd raise threads loop.
Afcmd test threads sleeps (life) 1second, so there is 1s/1ms = 1000 thread at the same time, and this is much less than the limit (32k).
Now 1 000 000 test will take some time, but it was passed (on ubuntu16).

I also performed a 200 afrenders test (on ubuntu16), but i did not get any new error message from pthread_create.
Just segmentation fault:
Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_create+0x4ff)[0x7f1a1777ee8f]
afserver(_ZN8DlThread5StartEPFvPvES0_+0xc5)[0x4c96d5]
afserver(_Z16threadAcceptPortPvi+0x509)[0x45f049]

So now i am at the same point as before.

Continue digging this issue...
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Afserver crashes

Post by timurhai »

@seven11
Great idea!
I added:

Code: Select all

pthread_attr_setstacksize(&attr, 32768);
And it works for about a half an hour!
It never works so long on my testing system.

Reading more about stack size...
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
seven11
Posts: 20
Joined: Tue Feb 21, 2017 8:03 pm

Re: Afserver crashes

Post by seven11 »

Great! I'm going to pull down the new afserver code and test it.
Scott
seven11
Posts: 20
Joined: Tue Feb 21, 2017 8:03 pm

Re: Afserver crashes

Post by seven11 »

Timur,
So I'm now running a Git pull of 2.2.2 with the pthread stack size set to 32K on a Centos 6.5 x86_64, 4 Proc, 8Gig Ram.
I can spawn a 1000 threads with the " afcmd tthr 1000" command. No issues so far.
I'll keep you posted...
Scott
Post Reply