Afserver crashes

General discussions and questions.
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Afserver crashes

Post by selsner »

Hello,

we have crashes of the afserver from time to time. Maybe you can help with finding the cause. This is the backtrace:

Code: Select all

#0  0x0000003752432625 in raise () from /lib64/libc.so.6
#1  0x0000003752433e05 in abort () from /lib64/libc.so.6
#2  0x0000003752470537 in __libc_message () from /lib64/libc.so.6
#3  0x0000003752475e66 in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003752479904 in _int_malloc () from /lib64/libc.so.6
#5  0x000000375247a6b1 in malloc () from /lib64/libc.so.6
#6  0x00000037564bd0bd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6
#7  0x00000037564bd1d9 in operator new[](unsigned long) () from /usr/lib64/libstdc++.so.6
#8  0x0000000000488157 in af::Msg::allocateBuffer (this=0x7f4aea1c0350, i_size=16384, i_copy_len=0, i_copy_offset=12) at /workspace/cgru/afanasy/src/libafanasy/msg.cpp:109
#9  0x000000000048866c in af::Msg::Msg (this=0x7f4aea1c0350, ss=0x7f4c040040b0) at /workspace/cgru/afanasy/src/libafanasy/msg.cpp:46
#10 0x000000000046192a in processMessage (i_args=0x7f4c04004080, io_prof=0x7f4ad9d2ca40) at /workspace/cgru/afanasy/src/server/threadprocessmsg.cpp:70
#11 0x0000000000461bb1 in threadProcessMsg (i_args=<value optimized out>) at /workspace/cgru/afanasy/src/server/threadprocessmsg.cpp:52
#12 0x00000000004ad2c9 in DlThread::thread_routine (i_params=0x7f4c04023a50) at /workspace/cgru/afanasy/src/libafanasy/common/dlThread.cpp:187
#13 0x00000037528079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x00000037524e89dd in clone () from /lib64/libc.so.6
The crash happens in msg.cpp:109 where the code is:

Code: Select all

	char * old_buffer = m_buffer;
	m_buffer_size = i_size;
	AFINFA("Msg::allocateBuffer(%s): trying %d bytes ( %d written at %p)", TNAMES[m_type], i_size, m_writtensize, old_buffer)
	m_buffer = new char[m_buffer_size];
	if( m_buffer == NULL )
	{
		AFERRAR("Msg::allocateBuffer: can't allocate %d bytes for buffer.", m_buffer_size)
		setInvalid();
		return false;
	}
"m_buffer = new char[m_buffer_size];" is line 109 and seems to be the problem when called with "m_buffer_size" of "16384".

What can it be that the server is crashing when allocating a new buffer?
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

Another question is:

Code: Select all

if( m_buffer == NULL )
as far as I know (and per http://www.cplusplus.com/reference/new/operator%20new[]/) the new operator never returns a NULL pointer. Why do you check it like this. Wouldn't it be good to use a try: catch: ??
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Afserver crashes

Post by timurhai »

Hi.
I agree that check does not needed.
But it can't cause hung.

This is a standard procedure for afserver to allocate 16k bytes buffer to read message from socket.
It always do it while communicating network.
And it is strange if it can't.
May be you have no free RAM?

btw
Try to specify cgru version, windows or linux distribution in each such question.
You can even put it in user signature.
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

The server has 16GB of RAM and normally only 3-4GB are used for real. But the cached/buffered part was nearly maxing out the RAM when I checked the last time. But I think the Linux OS frees the cached part if is needed or not?
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

Btw, we are using the latest version.

Today I used Valgrind for the first time. Like this:

Code: Select all

valgrind --tool=memcheck --leak-check=yes afserver

Is it normal for Valgrind to detect so much leaked memory even when there is no work and no render connected?

Code: Select all

==28879== 
==28879== HEAP SUMMARY:
==28879==     in use at exit: 68,808 bytes in 48 blocks
==28879==   total heap usage: 3,186 allocs, 3,138 frees, 20,122,268 bytes allocated
==28879== 
==28879== Thread 1:
==28879== 176 bytes in 1 blocks are definitely lost in loss record 4 of 8
==28879==    at 0x4A075FC: operator new(unsigned long) (vg_replace_malloc.c:298)
==28879==    by 0x448286: threadAcceptPort(void*, int) (threadacceptclient.cpp:173)
==28879==    by 0x4AD2C8: DlThread::thread_routine(void*) (dlThread.cpp:187)
==28879==    by 0x3434C079D0: start_thread (in /lib64/libpthread-2.12.so)
==28879==    by 0x34348E89DC: clone (in /lib64/libc-2.12.so)
==28879== 
==28879== 288 bytes in 1 blocks are possibly lost in loss record 5 of 8
==28879==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==28879==    by 0x34340118F2: _dl_allocate_tls (in /lib64/ld-2.12.so)
==28879==    by 0x3434C071E8: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==28879==    by 0x4AD1B2: DlThread::Start(void (*)(void*), void*) (dlThread.cpp:255)
==28879==    by 0x42F17A: main (main.cpp:267)
==28879== 
==28879== 2,624 bytes in 41 blocks are definitely lost in loss record 6 of 8
==28879==    at 0x4A075FC: operator new(unsigned long) (vg_replace_malloc.c:298)
==28879==    by 0x461B9A: threadProcessMsg(void*) (threadprocessmsg.cpp:47)
==28879==    by 0x4AD2C8: DlThread::thread_routine(void*) (dlThread.cpp:187)
==28879==    by 0x3434C079D0: start_thread (in /lib64/libpthread-2.12.so)
==28879==    by 0x34348E89DC: clone (in /lib64/libc-2.12.so)
==28879== 
==28879== 32,816 bytes in 1 blocks are definitely lost in loss record 7 of 8
==28879==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==28879==    by 0x34348A8620: __alloc_dir (in /lib64/libc-2.12.so)
==28879==    by 0x4479FC: AFCommon::getStoredFolders(std::string const&) (afcommon.cpp:155)
==28879==    by 0x42F5D6: main (main.cpp:183)
==28879== 
==28879== 32,816 bytes in 1 blocks are definitely lost in loss record 8 of 8
==28879==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==28879==    by 0x34348A8620: __alloc_dir (in /lib64/libc-2.12.so)
==28879==    by 0x4479FC: AFCommon::getStoredFolders(std::string const&) (afcommon.cpp:155)
==28879==    by 0x42F76A: main (main.cpp:207)
==28879== 
==28879== LEAK SUMMARY:
==28879==    definitely lost: 68,432 bytes in 44 blocks
==28879==    indirectly lost: 0 bytes in 0 blocks
==28879==      possibly lost: 288 bytes in 1 blocks
==28879==    still reachable: 88 bytes in 3 blocks
==28879==         suppressed: 0 bytes in 0 blocks
==28879== Reachable blocks (those to which a pointer was found) are not shown.
==28879== To see them, rerun with: --leak-check=full --show-reachable=yes
==28879== 
==28879== For counts of detected and suppressed errors, rerun with: -v
==28879== Use --track-origins=yes to see where uninitialised values come from
==28879== ERROR SUMMARY: 277 errors from 9 contexts (suppressed: 6 from 6)
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
User avatar
timurhai
Site Admin
Posts: 911
Joined: Sun Jan 15, 2017 8:40 pm
Location: Russia, Korolev
Contact:

Re: Afserver crashes

Post by timurhai »

Hi.
Can you tell Linux distribution and GCC version too.
I know about that leaks, that shows valgrind, and they does not accumulating, so does not affect anything really.
Valgrind shows the same functions back trace on server crash? ( i see only mem leaks info )
Timur Hairulin
CGRU 3.3.1, Ubuntu 20.04, 22.04, MS Windows 10 (clients only).
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

This is a standard CentOS 6.6 with gcc 4.4.7

The valgrind and the crash backtrace are different. Will test more.

signature updated
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
lithorus
Posts: 28
Joined: Wed Jan 25, 2017 4:14 pm

Re: Afserver crashes

Post by lithorus »

For the record, I'm also seeing afserver (2.2.1) crashing from time to time for no obvious reason, whereas 2.1.0 seemed rock solid. Don't have a stack trace though...
selsner
Posts: 47
Joined: Wed Jan 25, 2017 11:20 am

Re: Afserver crashes

Post by selsner »

It seems it is happening here after a while if I add all workstations to the farm. If I add the workstations to a separate server all is fine with them and no crashes.
@lithorus: How many clients are online at your place?
CGRU 2.3.1 - CentOS 7.7

Sebastian Elsner - Pipeline Technical Director - RISE
www.risefx.com
lithorus
Posts: 28
Joined: Wed Jan 25, 2017 4:14 pm

Re: Afserver crashes

Post by lithorus »

selsner wrote: Fri Feb 24, 2017 8:58 am It seems it is happening here after a while if I add all workstations to the farm. If I add the workstations to a separate server all is fine with them and no crashes.
@lithorus: How many clients are online at your place?
@selsner :
Clients as in renders or workstations monitoring?

Have 21 renders (18 online) and 19 monitors.
Post Reply