Bug #225

A genom3-ros module linked with opencv die on Alarm at startup.

Added by Félix Ingrand 9 months ago. Updated 8 months ago.



etdisc@etdisc-virtual-machine:~/work/bug/build$ bug-ros
bug: advertising ports
bug: initialized outport genom_state
bug: spawned task track
Alarm clock

See the example in the tar ball.

If you remove the execution task (not need for a timer), the pb goes away.

The opencv is just linked in the codel lib, not even called... May be I am not properly using opencv thought, this code comes from ISAE for a student TP.


bug.tgz (5.58 KB) bug.tgz A minimal module which shows the problem. Félix Ingrand, 2019-10-26 14:18

Updated by Félix Ingrand 9 months ago

Digging further, I noticed that the module works fine if I start it in daemon mode (-b).


Updated by Anthony Mallet 8 months ago

I can't reproduce this neither with opencv-3.2 nor with -3.4.
(the attached test case works very fine).

Do you have detailed info on the failing setup?

On the other hand, if a library is using SIGALRM that will for sure
not work with periodic tasks.


Updated by Félix Ingrand 8 months ago

This is on an ubuntu 14.04 (not my choice)...
pkg-config --modversion opencv

I also had it on an ubuntu 16.04
pkg-config --modversion opencv

I confirm that the problem is not present under 18.04 with
pkg-config --modversion opencv

In any case, as the component works fine when started in background, this is not a show stopper.


Updated by Anthony Mallet 8 months ago

This comes from a static constructor in opencv2:

(gdb) bt
#0  __pthread_create_2_1 (newthread=0x7fffe8974438, attr=0x0,
start_routine=0x7fffe876c5b0, arg=0x0) at pthread_create.c:505
#1  0x00007fffe876c7ef in ?? () from /lib/x86_64-linux-gnu/
#2  0x00007fffe876a915 in ?? () from /lib/x86_64-linux-gnu/
#3  0x00007fffe8762b1d in libusb_init ()
from /lib/x86_64-linux-gnu/
#4  0x00007ffff128c55c in ?? () from /usr/lib/x86_64-linux-gnu/
#5  0x00007ffff1278b69 in dc1394_new ()
from /usr/lib/x86_64-linux-gnu/
#6  0x00007ffff5bfbb19 in CvDC1394::CvDC1394() ()
from /usr/lib/x86_64-linux-gnu/
#7  0x00007ffff5be6540 in ?? ()
from /usr/lib/x86_64-linux-gnu/
#8  0x00007ffff7de76ca in call_init (l=<optimized out>, argc=argc@entry=1,
argv=argv@entry=0x7fffffffdff8, env=env@entry=0x7fffffffe008)
at dl-init.c:72
#9  0x00007ffff7de77db in call_init (env=0x7fffffffe008, argv=0x7fffffffdff8,
argc=1, l=<optimized out>) at dl-init.c:30
#10 _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffdff8,
env=0x7fffffffe008) at dl-init.c:120
#11 0x00007ffff7dd7c6a in _dl_start_user () from /lib64/

The problem is that libusb_init() creates a thread. As this is invoked
from a static variable constructor, it is invoked before main(), so
before main() can change the default signal mask of all threads via
pthread_sigmask(). Thus, as soon as the periodic task timer is started,
the thread from libusb that does not have SIGALRM blocked gets the
signal and terminates the program (default SIGALRM action).

I'll think about a possible clean solution.


Updated by Anthony Mallet 8 months ago

Also: it's probably not working (well) when daemonizing the component.

When you fork(2), the static constructors are not called again. And
since the libusb thread is not cloned with the fork(2) call either,
libusb might not be well initialized. It acts as if it was working,
because there is no extraneous thread to get the SIGALRM, but it works
by chance: this is "undefined behaviour".


Updated by Anthony Mallet 8 months ago

FWIW, the defect is still present in opencv3. It's just that the
DC1394 stuff was moved to the videoio library. As soon as it is linked
in the component, it fails:

--- codels/       2019-10-26 13:33:23.000000000 +0200
+++ codels/        2019-10-28 13:28:23.269172000 +0100
@@ -1,5 +1,7 @@
 #include "acbug.h" 

+#include "opencv2/videoio.hpp" 
 #include "bug_c_types.h" 

@@ -12,6 +14,6 @@
Nothing(const genom_context self)
-  /* skeleton sample: insert your code */
-  /* skeleton sample */ return genom_ok;
+  auto x = new cv::VideoCapture;
+  return genom_ok;

This should be reported upstream I guess.


Updated by Anthony Mallet 8 months ago

No need to report it, it's already been fixed in 3.4.5:

Also available in: Atom PDF