Page 1 of 1

afserver Core Dump

Posted: Thu Jan 10, 2019 7:42 am
by dagtm
Hi!

First of all, nice work!
I've installed your package and it worked just fine for first 2 days, but then suddently out of nowhere, db stopped working and I was unable to connect via web to :51000.
I've check db_connection and no luck:

Code: Select all

: afcmd db_check
AFERROR: afanasy.cmd.Check: Database connect failed: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
could not create socket: Address family not supported by protocol


Database connection is NOT WORKING !
Any ideas how to debug this? why would database suddenly stopped ?

logs do not help much as well:

Code: Select all

: cat /var/tmp/afanasy/afserver.tmserver.log
Compilation date = 'Oct 15 2018 12:49:25'
CGRU version = '2.3.0'
Afanasy build version = '2.3.0'
Afanasy build revision = 'd38556004cc99eaf91020592704443be43efa3e2'
Python version = '3.6.6'
GCC version = '8.1.1'
RLIMIT_NOFILE: Files descriptors limit: 10240
RLIMIT_NPROC: Processes (threads) limit: 95735
Config file does not exist:
/tmp/.cgru/config.json
Database connection "AFDB_update" is not working.
Wed 09 Jan 23:10.08: INFO    Reading store file: "/var/tmp/afanasy/server.json"
AFERROR: AFDB_upTables: Database connect failed: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
could not create socket: Address family not supported by protocol

Re: suddenly db stopped working

Posted: Thu Jan 10, 2019 9:25 am
by timurhai
Hi.

1. Database.
With this settings it can't connect to database:
https://github.com/CGRU/cgru/blob/maste ... .json#L181
With this default settings, it should connect if database was created by this steps from info:
http://cgru.info/afanasy/server#statistics
btw, SQL database needs to store statistics only.

2. WebGUI.
First of all, any GUI has no relation with SQL database.
You should be able to connect to running Afanasy server by web browser at [ afserver IP | afserver afname ]:51000.
For example 127.0.0.1:51000 if afserver runs on the same host as web browser.
If it does not:
- Check whether afwatch is working, may be server can't serve GUIs at all.
- What web browser shows, it should tell something why it can't connect.
- See server logs.

Re: suddenly db stopped working

Posted: Thu Jan 10, 2019 4:44 pm
by dagtm
Thanks @timurhai!

Yeah it's wierd that it all worked perfectly for ~ 2 days, and suddenly everything stopped (maybe something got updated. It's Fedora it has lots of updates everyday :) ). I'm trying to find out where to start debugging. Within logs, I see Segmentation fault, but no further details :/
I guess that settings/configs were ok, as it was working without issues and I haven't changed anything since (just came home after work and it wasn't working anymore).
I have no access to webgui anymore (i've also tried local) - "connection refused" (I have no FW running there at all, and it was all good day before)

I don't see anywhere 'afwatch' running. Could you please give some more insights on this? (is it a service? how should it start?)


afserver is working

Code: Select all

$ systemctl status afserver
● afserver.service - LSB: Afanasy afserver daemon
   Loaded: loaded (/etc/rc.d/init.d/afserver; generated)
   Active: active (exited) since Wed 2019-01-09 23:42:18 PST; 9h ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0 (limit: 4915)
   Memory: 0B
   CGroup: /system.slice/afserver.service
Server logs ('/var/tmp/afanasy/afserver.myserver.log') shows exactly same as below (besides 'core dump' - i don't see that line in server logs, only when locally triggered):

Code: Select all

$ source setup.sh 
CGRU_VERSION 2.3.0 : '/opt/cgru'
$ afserver 
Compilation date = 'Oct 15 2018 12:49:25'
CGRU version = '2.3.0'
Afanasy build version = '2.3.0'
Afanasy build revision = 'd38556004cc99eaf91020592704443be43efa3e2'
Python version = '3.6.6'
GCC version = '8.1.1'
RLIMIT_NOFILE: Files descriptors limit: 1024
RLIMIT_NPROC: Processes (threads) limit: 95735
Database connection "AFDB_update" is not working.
Thu 10 Jan 08:37.17: INFO    Reading store file: "/var/tmp/afanasy/server.json"
AFERROR: AFDB_upTables: Database connect failed: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
could not create socket: Address family not supported by protocol

Thu 10 Jan 08:37.17: INFO    Getting renders from store...
Thu 10 Jan 08:37.17: INFO    1 renders found.
Render offline registered - "mywork".
Thu 10 Jan 08:37.17: INFO    1 renders registered.
Thu 10 Jan 08:37.17: INFO    Getting users from store...
Thu 10 Jan 08:37.17: INFO    3 users found.
Thu 10 Jan 08:37.17: INFO    3 users registered from store.
Thu 10 Jan 08:37.17: INFO    Getting branches from store...
Thu 10 Jan 08:37.17: INFO    2 branches found.
Thu 10 Jan 08:37.17: INFO    Root branch created from store.
Segmentation fault (core dumped)

aaand there is this segmentation crash:

Code: Select all

Jan 09 23:49:10 myserver audit[14401]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 pid=14401 comm="afserver" exe="/opt/cgru/afanasy/bin/afserver" sig=11 res=1
Jan 09 23:49:10 myserver kernel: afserver[14401]: segfault at 0 ip 00007f835b927f5e sp 00007fff75a5fc20 error 4 in libstdc++.so.6.0.25[7f835b895000+bd000]
Jan 09 23:49:10 myserver kernel: Code: 85 2d 02 00 00 48 83 c4 48 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 84 00 00 00 00 00 48 8b bd e8 00 00 00 4c 89 f2 48 8b 34 24 <48> 8b 07 ff 50 60 48 8b 13 48 8b 6a e8 48 01 dd 49 39 c6 74 9f 8b
Jan 09 23:49:10 myserver systemd[1]: Started Process Core Dump (PID 14409/UID 0).
Jan 09 23:49:10 myserver audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-14409-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 09 23:49:10 myserver systemd-coredump[14410]: Process 14401 (afserver) of user 1000 dumped core.
                                                  
                                                  Stack trace of thread 14401:
                                                  #0  0x00007f835b927f5e _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l (libstdc++.so.6)
                                                  #1  0x00000000004c0bb3 _ZN2af6LoggerD1Ev (afserver)
                                                  #2  0x000000000043cb0d _ZN9BranchSrv9setParentEPS_.cold.127 (afserver)
                                                  #3  0x000000000045bf00 _ZN17BranchesContainer18addBranchFromStoreEP9BranchSrv (afserver)
                                                  #4  0x000000000044b944 main (afserver)
                                                  #5  0x00007f835b4c8413 __libc_start_main (libc.so.6)
                                                  #6  0x000000000044dc1a _start (afserver)
                                                  
                                                  Stack trace of thread 14403:
                                                  #0  0x00007f835bd5d0c6 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f835bd5d1b8 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00000000004a4dc7 _ZN2af7AfQueue3popENS0_8WaitModeE (afserver)
                                                  #3  0x00000000004a4ef2 _ZN2af7AfQueue3runEv (afserver)
                                                  #4  0x00000000004ae6e9 _ZN8DlThread14thread_routineEPv (afserver)
                                                  #5  0x00007f835bd5458e start_thread (libpthread.so.0)
                                                  #6  0x00007f835b5a16a3 __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 14402:
                                                  #0  0x00007f835bd5d0c6 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f835bd5d1b8 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00000000004a4dc7 _ZN2af7AfQueue3popENS0_8WaitModeE (afserver)
                                                  #3  0x00000000004a4ef2 _ZN2af7AfQueue3runEv (afserver)
                                                  #4  0x00000000004ae6e9 _ZN8DlThread14thread_routineEPv (afserver)
                                                  #5  0x00007f835bd5458e start_thread (libpthread.so.0)
                                                  #6  0x00007f835b5a16a3 __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 14404:
                                                  #0  0x00007f835bd5d0c6 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f835bd5d1b8 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00000000004a4dc7 _ZN2af7AfQueue3popENS0_8WaitModeE (afserver)
                                                  #3  0x00000000004a4ef2 _ZN2af7AfQueue3runEv (afserver)
                                                  #4  0x00000000004ae6e9 _ZN8DlThread14thread_routineEPv (afserver)
                                                  #5  0x00007f835bd5458e start_thread (libpthread.so.0)
                                                  #6  0x00007f835b5a16a3 __clone (libc.so.6)
Jan 09 23:49:10 myserver audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-14409-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'                                             

Re: suddenly db stopped working

Posted: Thu Jan 10, 2019 5:10 pm
by timurhai
Hi.
Afwatch is a Qt GUI. I asked to try it just to check browser http connection works.
But now we see that the reason is afserver hung.

You can try some "catchsegv" or other linux utility to get more info at "Segmentation fault".

Also you can try to catch a bug.
Stop afserver.
Delete store folder at all (/var/tmp/afanasy).
Run afserver afserver from a console to see its process real-time output:

Code: Select all

$ cd /opt/cgru
$ source setup.sh 
$ afserver

Re: afserver Core Dump

Posted: Fri Jan 11, 2019 7:32 am
by dagtm
I think removing '/var/tmp/afanasy' helped here. I will try to catchsegv when it happens again.

Thanks!