-
Notifications
You must be signed in to change notification settings - Fork 19.7k
AP_HAL_ChibiOS: Enable tx FIFO for multiframe bxcan messages #31393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I'm not sure the central thesis of this PR is correct. We have logic here to ensure that we don't put more than one message with the same ID into the hardware buffers, thereby avoiding the reordering issue while more closely preserving CAN priority semantics. I don't doubt that enabling the FIFO changes things up and may smooth out bus traffic, perhaps even because priority is hurting in your application (maybe timeouts are not set up correctly for some messages). There's also some chance the software logic has broken. But in any case it's not a complete fix because the software logic will still avoid putting more than one part of a multi-part message in the hardware buffers at once, thereby negating much of the benefit of FIFO mode. On the other hand, FDCAN using FIFO mode successfully for a long time is an indication that priority might not matter as much as the bxCAN driver author thought, so we could just turn on FIFO mode and remove the priority preserving logic. It's worth remembering that on average FDCAN is used on the FC side and bxCAN is used on the periph side, and the usage patterns are generally different. This will require careful thought and testing. I would love to know more about your bus setup and testing conditions to replicate the failure and success cases here, it would be good info to understand what's really going on at the bus level and get data points to make the decision of which CAN driver is right. |
|
Adding the link to the discord discussion that is also ongoing: https://discord.com/channels/674039678562861068/800819275069390920/1432584750920503429 |
andyp1per
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't see any downsides to this. We already use FIFO in other areas of serial comms. I agree it may not fully solve the problem, but seems like a step in the right direction.
|
The downside is that it breaks CAN priority semantics. A high priority message can get stuck if the next message to transmit is low priority and the bus is busy with medium priority messages. Just turning on the hardware FIFO will also not allow any more slots to be used due to the priority-based scheduling logic. If we want to go FIFO we should remove that too. |
|
From Discord: So a setup of:
We're monitoring some inertial data in the AP and passing MAVLink traffic in and out of the companion computer. When the patch is applied, these Wrong Toggle and Bad CRCs are gone completely. The best I can tell is the INS is running in a separate thread, and so the FIFO enabling is at least ensuring that all the packets are sent in the order they are intended. The MAVLink isn't high rate data, we're only talking a handful of HEARTBEATS, MCU_STATUS, etc. It doesn't appear to be timeouts occurring on the Tx side (We've got Tx Timeouts on the AP side we've observed when sending amounts of data that end up causing a timeout to occur). |
|
Have you enabled and checked the can_stats.txt on the autopilot? What does that show? |
|
Thanks @tpwrules, I've dug into this further today. The priority logic you mentioned doesn't get called in the processTx periph logic. The select function used here isn't added to processTx for for the periph. I'm not sure if there was a reason for omitting it. I did enable can_stats.txt and it didn't show anything more than the results I saw through the DroneCan GUI stats panel so there's nothing significant to add there. Today I've experimented adding the select function in and removing this PR. I need to investigate this option further. |
|
If it's true the priority logic never ran on periph then it's probably the correct move to just delete the logic and turn on the FIFO in hardware. However, this will affect bxCAN-based flight controllers. I have had trouble sourcing one to test with. What F4 Periph-with-IMU are you using? |
|
I've done a bit more experimenting with adding in the select function just out of curiosity. Yes it does work however throughput is reduced because we end up using only one mailbox mostly. I know I've hit the throughput limit because my tx_errors and rx_error_oom climb due to hitting the CAN pool limits as the Tx queue grows. In contrast, enabling FIFO in hardware I use all 3 mailboxes allowing the queue to processed faster. |
|
I'm working again on getting hardware to test and going through the code myself. I'm more on the side of just using FIFO mode now but this patch should be augmented to pull out the unused priority logic. Thanks for digging deeper! I also still need to talk to @tridge . |
Multi-frame DroneCAN messages with a periph in the loop is experiencing CRC and wrong toggle errors. I believe this is related to #28545. According to DroneCAN and STM32 documentation we should be enabling tx FIFO to maintain the correct order of multi-frame messages.
I can see in the bootloader the FIFO flag is being set. The equivalent is also being set for CANFD. FIFO is not being set though for standard CAN, in my case with an STM32F4xx periph.
From the DroneCAN documentation:
https://dronecan.github.io/Specification/4.1_CAN_bus_transport_layer/
stm32f4xx 34.7.1 docs:
So we must set the TXFP flag to transmit by request order.
In my testing watching the stats panel in DroneCAN GUI I can see a drop in "Rx Error Wrong Toggle" and "RX Error Bad CRC" on the Autopilot Node when I've applied this patch.