Afserver crashes
Re: Afserver crashes
or maybe you can give me a good hint on how to properly attach valgrind so we have more debug info.
Re: Afserver crashes
Hi.
As it crashes on a new thread start, i can say the exact SIGSEGV point in the code:
https://github.com/CGRU/cgru/blob/maste ... d.cpp#L255
But unfortunately for now i can't say why we have some probability to hung there.
And why this probability is differ on different platforms.
Pthread library uses system "clone" call.
For now i do not know what we need to know more and what more can valgrind say, but may be it can help more.
As it crashes on a new thread start, i can say the exact SIGSEGV point in the code:
https://github.com/CGRU/cgru/blob/maste ... d.cpp#L255
But unfortunately for now i can't say why we have some probability to hung there.
And why this probability is differ on different platforms.
Pthread library uses system "clone" call.
For now i do not know what we need to know more and what more can valgrind say, but may be it can help more.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
Re: Afserver crashes
Hi.
I just wrote (and committed) threads raising test:
https://github.com/CGRU/cgru/blob/maste ... st.cpp#L41
It is an afcmd command that do almost the same as afserver on each new connection.
Strange results:
On my laptop (ubuntu16 gcc 5.4) i can raise 1 000 000 threads. Tried many times - no crashes.
But afserver crashes on this platform on when i spawn 200 afrenders.
At work (ubuntu14 gcc4.8) this test can crash staring from 1000 threads and definitely crash on 100 000 threads.
But i can`t crash afserver on this platform, including crash-tests with 200 afrenders.
I should read about pthread library more. Likely that afanasy uses it some incorrectly.
( note that DlThread class was just taken from 3delight, that is not a server, but render engine )
You are welcome to test this command on your platforms.
I just wrote (and committed) threads raising test:
https://github.com/CGRU/cgru/blob/maste ... st.cpp#L41
It is an afcmd command that do almost the same as afserver on each new connection.
Strange results:
On my laptop (ubuntu16 gcc 5.4) i can raise 1 000 000 threads. Tried many times - no crashes.
But afserver crashes on this platform on when i spawn 200 afrenders.
At work (ubuntu14 gcc4.8) this test can crash staring from 1000 threads and definitely crash on 100 000 threads.
But i can`t crash afserver on this platform, including crash-tests with 200 afrenders.
I should read about pthread library more. Likely that afanasy uses it some incorrectly.
( note that DlThread class was just taken from 3delight, that is not a server, but render engine )
You are welcome to test this command on your platforms.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
Re: Afserver crashes
I am not able to get it to crash ....
Re: Afserver crashes
@selsner,
Are you on a 32 or 64 bit kernel running the afserver?
Timur,
Why is the stack size for each thread so large? I have been testing an afserver with the thread stack size set to 32768 which reduces the virtual memory usage. I couldn't find where
the stack size was being set so I add this pthread_attr_setstacksize(&attr, 32768); to dlThread.cpp
Have a look at the first answer of this post:
http://stackoverflow.com/questions/5585 ... -of-memory
Scott
Are you on a 32 or 64 bit kernel running the afserver?
Timur,
Why is the stack size for each thread so large? I have been testing an afserver with the thread stack size set to 32768 which reduces the virtual memory usage. I couldn't find where
the stack size was being set so I add this pthread_attr_setstacksize(&attr, 32768); to dlThread.cpp
Have a look at the first answer of this post:
http://stackoverflow.com/questions/5585 ... -of-memory
Scott
Re: Afserver crashes
So on my afserver machine each pthread stack uses 10485760 of virtual memory on a stock 2.2.1 afserver.
Here's the patch code if someone would like to try it:
Scott
Here's the patch code if someone would like to try it:
Code: Select all
--- dlThread.cpp 2017-03-15 12:09:05.300428237 -0700
+++ dlThread.cpp.mod 2017-03-15 12:08:45.772250461 -0700
@@ -23,6 +23,7 @@
/* The following chunk are for GetNbProcessors. */
#ifdef LINUX
#include <sys/sysinfo.h>
+#include <stdio.h>
#endif
#ifdef IRIX
@@ -246,6 +247,12 @@
#else
pthread_attr_t attr;
pthread_attr_init( &attr );
+ pthread_attr_setstacksize(&attr, 32768);
+
+// size_t size;
+// int ret = pthread_attr_getstacksize(&attr, &size);
+// printf ( "Get: ret=%d,size=%u\n" , ret , size ) ;
+
if( detached )
{
pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );
Scott
Re: Afserver crashes
May latest post and commit has a mistake.
DlThread does not check whether the thread was started.
And does return void from Start() function.
We can't raise 1 000 000 threads at the same time.
That test just stops to raise threads at all w/o any error message.
Normally on Linux you can't spawn more than 32k threads, and this is more than enough for us.
Now changed DlThread to return an integer that pthread_create returns.
And added a check in afserver and afcmd test.
https://github.com/CGRU/cgru/commit/583 ... c985e005b3
Also 1ms sleep was added in afcmd raise threads loop.
Afcmd test threads sleeps (life) 1second, so there is 1s/1ms = 1000 thread at the same time, and this is much less than the limit (32k).
Now 1 000 000 test will take some time, but it was passed (on ubuntu16).
I also performed a 200 afrenders test (on ubuntu16), but i did not get any new error message from pthread_create.
Just segmentation fault:
Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_create+0x4ff)[0x7f1a1777ee8f]
afserver(_ZN8DlThread5StartEPFvPvES0_+0xc5)[0x4c96d5]
afserver(_Z16threadAcceptPortPvi+0x509)[0x45f049]
So now i am at the same point as before.
Continue digging this issue...
DlThread does not check whether the thread was started.
And does return void from Start() function.
We can't raise 1 000 000 threads at the same time.
That test just stops to raise threads at all w/o any error message.
Normally on Linux you can't spawn more than 32k threads, and this is more than enough for us.
Now changed DlThread to return an integer that pthread_create returns.
And added a check in afserver and afcmd test.
https://github.com/CGRU/cgru/commit/583 ... c985e005b3
Also 1ms sleep was added in afcmd raise threads loop.
Afcmd test threads sleeps (life) 1second, so there is 1s/1ms = 1000 thread at the same time, and this is much less than the limit (32k).
Now 1 000 000 test will take some time, but it was passed (on ubuntu16).
I also performed a 200 afrenders test (on ubuntu16), but i did not get any new error message from pthread_create.
Just segmentation fault:
Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_create+0x4ff)[0x7f1a1777ee8f]
afserver(_ZN8DlThread5StartEPFvPvES0_+0xc5)[0x4c96d5]
afserver(_Z16threadAcceptPortPvi+0x509)[0x45f049]
So now i am at the same point as before.
Continue digging this issue...
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
Re: Afserver crashes
@seven11
Great idea!
I added:
And it works for about a half an hour!
It never works so long on my testing system.
Reading more about stack size...
Great idea!
I added:
Code: Select all
pthread_attr_setstacksize(&attr, 32768);
It never works so long on my testing system.
Reading more about stack size...
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
Re: Afserver crashes
Great! I'm going to pull down the new afserver code and test it.
Scott
Scott
Re: Afserver crashes
Timur,
So I'm now running a Git pull of 2.2.2 with the pthread stack size set to 32K on a Centos 6.5 x86_64, 4 Proc, 8Gig Ram.
I can spawn a 1000 threads with the " afcmd tthr 1000" command. No issues so far.
I'll keep you posted...
Scott
So I'm now running a Git pull of 2.2.2 with the pthread stack size set to 32K on a Centos 6.5 x86_64, 4 Proc, 8Gig Ram.
I can spawn a 1000 threads with the " afcmd tthr 1000" command. No issues so far.
I'll keep you posted...
Scott