Project

General

Profile

Bug #182

Trajectory bug after agressive velocity command

Added by Pol MORDEL 5 months ago. Updated 5 months ago.

Status:
New
Priority:
High

Description

Hi all,

While doing visual servoing task on an apriltag with 2Uavs running pocolibs, we suffered a crash when the slave robot had to respond to some agressive modification of the velocity command. We then did numerous test with multiple quadrotor and were able to reproduce this crash every time we tried to mess around with the velocity command. We were even able to reproduce it in simulation using mrsim. To illustrate our issue, you can download videos/figures and logs of one simulation here : https://filesender.renater.fr/?s=download&token=900aba4b-3d28-14fd-bd45-89add8c4e8d7


Files

cmdVel20Hz.png (24.2 KB) cmdVel20Hz.png test velocity command Pol MORDEL, 2018-10-30 15:33
genom3-pocolibs.patch (1.15 KB) genom3-pocolibs.patch Anthony Mallet, 2018-10-30 18:56

Associated revisions

Revision 1084832b (diff)
Added by Anthony Mallet 5 months ago

Don't stop simulink simulation upon transaction error in genomix blocks

When sending a request or reading a port, genomix can report regular/expected
component errors. In this case, don't stop the simulation and simply report the
error through two additional output ports: errno and errmsg. Those ports can be
checked in order to take appropriate decisions, depending on the error.

errno is the http status code in case of error, or 0 on success.
errmsg is a uint8 array that can be converted to a string to get the
json-encoded error message from genomix.

Other errors, like invalid port name, request name or communication issues are
still reported as fatal errors in simulink.

This should help with issue #182

Revision 139c0f68 (diff)
Added by Anthony Mallet 5 months ago

Mitigate the S_csLib_TOO_MANY_RQST_IDS error in client/c

When sending a request with the *_rqst() client functions, it may happen that
the csLib client mailbox is full of pending requests, preventing the new
request from being sent.

This might in particular happen if the the doevents() function has not been
called recently/frequently enough. To mitigate this, the *_rqst() functions now
try to empty the csLib mailbox themselves and try one more time to send the
request. If this still fails with a S_csLib_TOO_MANY_RQST_IDS, the error
reported. This is because the error might actually be a legitimate error
due to a large number of unterminated requests.

In order to implement this properly, the dovents() function is split in a
recv() function that processes the csLib mailbox and decodes the results, and
the doevents() function itself that invokes the callbacks associated to
requests replies (if any) that have been updated by recv(). The _rqst()
function only invokes the recv() part upon a S_csLib_TOO_MANY_RQST_IDS, so as
to not recursively process any callbacks.

While here, the error S_csLib_TOO_MANY_RQST_IDS is now reported as a
"genom::too_many_activities", instead of the previous "genom::mwerr"
exception. This may make it easier for clients to detect this situation.

This addresses issue #182

History

#1

Updated by Pol MORDEL 5 months ago

While we looked into detail into those crash. We are now pretty sure that it comes from maneuver and the kdtp trajectory generation that cannot follow multiple velocity sudden variation.

#2

Updated by Pol MORDEL 5 months ago

As shown in the figure angular_velocity.fig, we noticed as well a non-continuation into the yaw velocity conversion caused by ** if (fabs(dyaw) < 0.25) {
dqw = 1 - dyaw2/8; /
cos(dyaw/2) ± 1e-5 /
dqz = (0.5 - dyaw2/48) * dyaw; /
sin(dyaw/2) ± 1e-6 */
}. Though we do not think it is the cause of our main issue, we did not understand what is done here ?

#3

Updated by Pol MORDEL 5 months ago

I know that you do not use maneuver::velocity at Laas but could you look into this to give your opinion about this issue please ?

Thank you very much

#4

Updated by Anthony Mallet 5 months ago

On Thursday 25 Oct 2018, at 19:15, Pol MORDEL wrote:

As shown in the figure angular_velocity.fig, we noticed as well a
non-continuation into the yaw velocity conversion caused by ** if
(fabs(dyaw) < 0.25) {
dqw = 1 - dyaw2/8; /
cos(dyaw/2) ± 1e-5 /
dqz = (0.5 - dyaw2/48) * dyaw; /
sin(dyaw/2) ± 1e-6 */
}. Though we do not think it is the cause of our main issue, we
did not understand what is done here ?

I don't see a discontinuity in the logs, the yaw, wz, dwz and ddwz
seem consistent to me at first sight.
What are you sending for wz and duration during this phase?

What is done in the code excerpt above is the computation of the Δq
induced by the wz, dwz and ddwz of the trajectory, for small angle
approximation. The Δq is then added to the current yaw. As the comment
suggest, this is precise up to 1e-5 and should not introduce big
errors compared to using the proper trigonometric functions. It can be
confirmed by looking at the resulting yaw in the plot, which is
visually not "crazy".

#5

Updated by Anthony Mallet 5 months ago

On Thursday 25 Oct 2018, at 19:12, Pol MORDEL wrote:

While we looked into detail into those crash. We are now pretty sure
that it comes from maneuver and the kdtp trajectory generation that
cannot follow multiple velocity sudden variation.

Do you have a log file of the velocity commands you send ?
Or a small sequence that triggers the crash?

#6

Updated by Pol MORDEL 5 months ago

Anthony Mallet wrote:

On Thursday 25 Oct 2018, at 19:15, Pol MORDEL wrote:

As shown in the figure angular_velocity.fig, we noticed as well a
non-continuation into the yaw velocity conversion caused by ** if
(fabs(dyaw) < 0.25) {
dqw = 1 - dyaw2/8; /
cos(dyaw/2) ± 1e-5 /
dqz = (0.5 - dyaw2/48) * dyaw; /
sin(dyaw/2) ± 1e-6 */
}. Though we do not think it is the cause of our main issue, we
did not understand what is done here ?

I don't see a discontinuity in the logs, the yaw, wz, dwz and ddwz
seem consistent to me at first sight.
What are you sending for wz and duration during this phase?

We send to wz a value between -0.62 and 0.62 (duration is 1)

what prompted us about this would be the comment since if cos(dyaw/2) ± 1e-5, sin(dyaw/2) should be 1

What is done in the code excerpt above is the computation of the Δq
induced by the wz, dwz and ddwz of the trajectory, for small angle
approximation. The Δq is then added to the current yaw. As the comment
suggest, this is precise up to 1e-5 and should not introduce big
errors compared to using the proper trigonometric functions. It can be
confirmed by looking at the resulting yaw in the plot, which is
visually not "crazy".

Ok my bad, I did check the math and there is no discontinuity indeed. Thank you for your explanation

#7

Updated by Pol MORDEL 5 months ago

Anthony Mallet wrote:

On Thursday 25 Oct 2018, at 19:12, Pol MORDEL wrote:

While we looked into detail into those crash. We are now pretty sure
that it comes from maneuver and the kdtp trajectory generation that
cannot follow multiple velocity sudden variation.

Do you have a log file of the velocity commands you send ?
Or a small sequence that triggers the crash?

Here is a second set of data with the log and figure of the velocity command : https://filesender.renater.fr/?s=download&token=a7846323-cb15-eb00-3651-d97283601f14 (do not use the mk2 data I copied those value by error from a previous test). The command remains between the bounds (vx, vy, vz goes between -0.45 and +0.45). In order to repeat the crash we then switch the command repeatedly between cmd_vel_max and cmd_vel_min until the drone start to diverge. Then maneuver does not respond to the command that is 0,0,0,0 (I can send you the video of the simulation as well if you want)

#8

Updated by Anthony Mallet 5 months ago

On Friday 26 Oct 2018, at 15:18, Pol MORDEL wrote:

The command remains between the bounds (vx, vy, vz goes between
-0.45 and +0.45). In order to repeat the crash we then switch the
command repeatedly between cmd_vel_max and cmd_vel_min until the
drone start to diverge. Then maneuver does not respond to the
command that is 0,0,0,0 (I can send you the video of the simulation
as well if you want)

So far I could not reproduce any similar crash (in simulation),
sending alternating -0.5 and +0.5 x,y velocities at various
frequencies, up to 500Hz.

So I have a few additional questions/remarks:

  • When you say "command that is 0,0,0,0", what does it mean?
    maneuveur::velocity expects vx, vy, vz, wz, ax, ay, az, duration.
    I imagine that 0,0,0,0 mean 0 velocity, but what about acceleration
    and duration?
  • In general, when you use a duration > 0, maneuver::velocity may
    reply with an error that this is not feasable given current state
    and velocity (and acceleration, jerk etc.) limits.
    Using a duration of 0 is safer from this point of view, and also
    gives you the "minimum time trajectory" to reach the desired
    velocity/acceleration.

In your last logs, the last maneuver acceleration is not 0. This
means that a maneuver::velocity command with 0 is not taken into
account. Did you check that the 'duration' is OK with the limits (or
0) ?

  • What do you mean by "maneuver does not respond"? Does it crash or
    block? Any error message?
  • Do you have nhfc error messages about "emergency descent" ?
#9

Updated by Pol MORDEL 5 months ago

What do you mean by "maneuver does not respond"? Does it crash or
block? Any error message?
  • I just meant that it does not repond to the input of maneuver::velocity anymore. Maneuver does not crash nor give any error messages
In general, when you use a duration > 0, maneuver::velocity may
reply with an error that this is not feasable given current state
and velocity (and acceleration, jerk etc.) limits.
Using a duration of 0 is safer from this point of view, and also
gives you the "minimum time trajectory" to reach the desired
velocity/acceleration.
  • I used to set the duration to 1. Passing this parameter to 0 did indeed improve the behavior and i was not able to reproduce the crash in simulation.
  • Howerver, while testing this setup using three quadrotors (and duration to 0) in the same simulation, after about one minute the simulation stopped and I get the following error message :
...
samples 202
8.7e-05
samples 202
0.000219
16:39:17.118333:genomixd: sock17cbc90: reply status 400: {"ex":"::genom::mwerr","detail":{"what":"S_csLib_TOO_MANY_RQST_IDS"}}
16:39:17.804482:genomixd: sock1819cb0: reply status 400: {"ex":"::genom::mwerr","detail":{"what":"S_csLib_TOO_MANY_RQST_IDS"}}
samples 56
7.5e-05
samples 56
6.9e-05

"samples" comes form printf in maneuver (maybe we could remove it since it was meant for debug I guess ?) but the TOO_MANY_RQST_IDS error is new.

The TOO_MANY_RQST_IDS error is produced by the genomix request of maneuver/velocity :

An error occurred while running the simulation and the simulation was terminated
Caused by:
Error reported by S-function 'genomix_block' in 'velocity_maneuver_2017a/mkQuadro1/genomix request':
{"ex":"::genom::mwerr","detail":{"what":"S_csLib_TOO_MANY_RQST_IDS"}}
  • in one of the test, before getting the previous error, I received the following message as well but the program was able to recover :
    maneuver-pocolibs: kdtp::Spline::a_b: unknown case 2200 (vC = -nan)

And thank very much you for your time !

#10

Updated by Pol MORDEL 5 months ago

I noticed that the TOO_MANY_RQST_IDS error occurs eveytime i send a yaw velocity (wz) input (I am trying to understand what is going on)

--> after about 10 tests using 1 uav, I can reproduce the bug every time I try to send wz a value at 0.62 for more than 1-2 seconds
--> I did test it with wz max input at 0.3, same issue
--> the behavior is that the quadrotor spins well around z axis, then stop suddently, then spins fast around z axis
--> I need to restart the maneuver-pocoilbs component every time

#11

Updated by Anthony Mallet 5 months ago

On Monday 29 Oct 2018, at 16:51, Pol MORDEL wrote:

{"ex":"::genom::mwerr","detail":{"what":"S_csLib_TOO_MANY_RQST_IDS"}}
16:39:17.804482:genomixd: sock1819cb0: reply status 400: {"ex":"::genom::mwerr","detail":{"what":"S_csLib_TOO_MANY_RQST_IDS"}}
samples 56 7.5e-05 samples 56 6.9e-05

This is a completly different issue: this is because genomixd does
not process the replies to previous requests quickly enough to empty
its mailbox.

[ in a few words: the requests sent via simulink are sent in "oneway"
mode, meaning that the request reply is not sent back to simulink, to
save time - however, genomixd still receives and drops it, not quickly
enough in your case ]

There is not easy workaround right now. Your best bet would be to run
different genomixd for different quads and/or (if it makes sense)
lower the frequency at which you send velocities requests to
maneuver. I will try to think of something more clever.

"samples" comes form printf in maneuver (maybe we could remove it
since it was meant for debug I guess ?) but the TOO_MANY_RQST_IDS
error is new
.

Yes, you can definitely remove the printf, it slipped in by mistake,
it's was just for debug/development.

maneuver-pocolibs: kdtp::Spline::a_b: unknown case 2200 (vC = -nan)

That is a bug in libkdtp, then. If it happens only once in a while,
it's not too bad I guess, but this should be investigated. It's most
probably due to numerical roundings when your new target velocity is
too close from the current one or something like this.

#12

Updated by Pol MORDEL 5 months ago

Your best bet would be to run
different genomixd for different quads and/or (if it makes sense)
lower the frequency at which you send velocities requests to
maneuver. I will try to think of something more clever.

Yes we use one genomix process per quad already. I tried to drop the input velocity cmd rate form 50Hz to 20Hz. I was then able to fly 3 quads sending some really steep velocity command repeatedly and it worked well for vx, vy, vz. The only case I was able to reproduce the "TOO_MANY_RQST_IDS error" was when I switch wz from max to min repeatedly, but it worked much better for one way spin in yaw (cf figure attached). We should be able to make it work for a visual servoing application from there. Thank you for all your explanatio, it is really useful.

#13

Updated by Anthony Mallet 5 months ago

On Tuesday 30 Oct 2018, at 15:35, Pol MORDEL wrote:

The only case I was able to reproduce the "TOO_MANY_RQST_IDS error"
was when I switch wz from max to min repeatedly

The TOO_MANY_RQST_IDS is not related to a particular service or
anything else. It will just randomly pop up when
- either the linux scheduler decides to not schedule the specific
thread in genomixd marking events ready for the main genomix thread.
- or the Tcl event loop (genomixd is partially written in Tcl) delays
the processing of the request replies too much

I came up with a tentative patch, that explicitly calls the requests
replies processing when there is no room for new requests (this is
normally called when genomixd is otherwise idle).

I'm not too sure that I want to commit it yet, because it can
introduce recursion in the event processing by calling callbacks
associated to requests replies in the context of sending a new
request. This is normally OK, but I have to think more carefully about
this and consider all kind of genom clients we have here.

In the meantime, I would be interrested to see if it helps in your
case (I tryed it on a toy example and it was actually able to cancel
the TOO_MANY_RQST_IDS errors). To try it, you have to manually apply
the attached patch to the genom3-pocolibs source code, and then
recompile your components with the updated genom3-pocolibs (only those
components that have the error, so mostly maneuver I guess).

Depending on how you manage this with your setup, make really sure
that maneuver is using the new genom3-pocolibs generated code (you can
check by looking at the generated source code in
maneuver-genom3/build/pocolibs/client/c/src/maneuver_client.c
and checking that there are calls to
genom_maneuver_client_doevents(h);

Also available in: Atom PDF