-
Notifications
You must be signed in to change notification settings - Fork 367
SPI slave support #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPI slave support #580
Conversation
|
Looking at the xtensa failures ... Ah, at least part of this is the duplex mode generic arg, and some if the rest is the SPI3 peripheral. Sorting those out. |
|
I gave this a quick try and with the bit-banged master I didn't got it to work. I got this which looks quite good (despite the 0 reading in the first iteration) On the ESP32 side I got That also looks almost good beside the read on first iteration - the slave should have set the iteration count as the first and last element. It looks like the master is picking up the data starting at the 8th / 9th byte (I changed the slave example to fill the send buffer with bytes counting up) I will be travelling for business next week so I probably won't be able to do testing next week |
|
Turns out I haven't been able to get much time for this either lately. :-/ Maybe rewriting what's going on will help sort things out. On the first transfer, the first b9 = 185 bytes are missing -- or actually, it's 185+256*N for some integer N. So 185, 441, 697, 953, 1209, 1465, 1721, 1977, 2233, ... None of those look like particularly memorable numbers to me except possibly 1465, but even that is only because of MTU values on an old PPP encapsulated network link, so it has no relevance here. Can you change the master side code to hunt down where the incrementing bytes stop, and print out how many extra 0x75 bytes are at the end? Something like an rposition call on the slice's iterator, maybe. Not sure if this number will make anything obvious, but maybe. Further transfers always lose 8 bytes, as you said. I wonder where the 0x75 is coming from. Is that transmitted on the SPI bus, or are those bytes untouched by the DMA on the master side? What happens if you fill the RX buffer with some other fixed pattern before the transfer starts, on the master side? One other possibility is that the DMA on the slave side is getting triggered a bunch of times before the master sends clock pulses ... I wonder if the FIFO resets are happening at the right time. That wouldn't necessarily explain the first transfer (nor the zeros printed in the slave side), but of course there might be more than one issue. (Or actually, if it's timing related, maybe the first transfer takes longer to set up so it loses more into the FIFO?) I'll have to re check all of that with the TRM at some point soon. Or, I wonder if the slave hardware is looking for an 8 byte segmented transfer setup packet - the place that the TRM says to disable segmented transfers seems a bit late (it's after DMA is already started). That feels unlikely, but might explain something. I'll have to try to find some time to carefully reread the DMA chapter as well as the SPI one, with that in mind. But one other question I do have when you get time again - does the logic analyzer show the earlier bytes getting sent out at all, possibly not synchronized to a clock pulse? I don't think there could be spurious pulses (and I think a FIFO reset is more likely to be the cause either way), but I wonder what shows up on the bus. Thank you! |
|
... Yeah, section 26.5.7.3 of the -c3 TRM has a diagram. The various FIFOs involved in the send and receive paths are all 8*8 bits long. That's super suspicious. I think section 26.5.9.4 looks odd. Nowhere does it specifically say to turn on the DMA start bit, but the code does it at the "start the GDMA tx/rx engine" step, before it resets the three different afifos, and even before it configures the SPI peripheral for single transfer mode. I am guessing DMA needs to be started in between step 7 and 8, or maybe in the middle of step 8. Let's see if I can get that code together any time soon. |
|
Ok, let's see if that makes any difference. The last commit before rustfmt (74e14ef) makes some changes to the DMA code to let the SPI code start both tx and rx after the afifos get reset, but it also keeps compatibility for other peripherals, like i2s and the SPI master. That compatibility isn't strictly needed, but it's less to write to get this testable. I'll sort it out later, if it works. (I partly wonder if SPI master code needs to do something similar ... It looks like the send path is working, at least on the esp32, though. So probably not.) If this doesn't fix the first missing chunk, I would still be interested to find out how many 0x75 bytes are at the end of that one. |
|
Unfortunately looks worse now (again using spi_loopback_dma on the other side since the bitbanging doesn't seem to work for me) Slave Master |
|
I'm not sure what's going on with the bit banging, but I think since you have an esp32 that can plug in here, that's fine for testing for now - we can fix the bit banged master later. We'll need the logic analyzer (that you were using in the LEDC fading PR) for that anyway, I think. But for what's there now, both sides did eventually start producing the right numbers, on transmission number 0xb (so either 11 or 12; I can't remember for sure if that number starts at 0 or 1). So I think this is actually better in one way and worse in another: it's better in that it stops losing the first eight bytes, but it's worse in that it takes a lot more transmissions to get to that point. The earlier code got to that point after only one transmission. So I think the DMA change is needed, to only set the "start" bit after resetting the afifos. It's also odd that the master side shows a continuous sequence of numbers even across transmissions - the first row ends with 5, 4, 3, and the second row starts with 2, 1, ff. It looks like something is out of sync between the two - and here, again, I think the logic analyzer would be most useful. Not sure how many bits it can capture in a row, but if you can get it to sample all four lines (cs, miso, mosi, and sck) until the two sides get in sync, that would be best. If it can do that, can you post the capture file? The other question is, how long are the slices that the code is printing chunks of? They're supposed to be 32000 bytes, but I'd like to actually check. Can you change the code to print out the length on both sides before the first and last bytes, as well? Thank you very much! |
|
(There, let's get rid of ten of those commits, and squash them into wherever is appropriate. The one remaining change to fix the build for non -c3 devices can be squashed as well, but it's different enough for now that I don't think I will yet.) |
|
Output by slave Output by master Using the unmodified spi_loopback_dma.rs as master. I guess there won't be much progress in this working mode so it would be probably good to get (ideally) two dev-boards |
|
Yeah, maybe sorting it out on my own would be faster in terms of turnaround time. There will definitely be some downtime before I can start that, though, as while I can mess with software here, I can't mess with hardware (even if I did have two of the dev boards). But, weirdly, what I see from the logic analyzer capture looks perfect. The MISO line is registering bytes that go down starting at 0xfe, while the MOSI line is registering bytes that go up starting at 1. That tells me that the problem is somewhere on the DMA side, or the SPI peripheral isn't interpreting some of the registers the way I expected. But while I could try to describe the changes I'd like to use to narrow things down more, I don't think the changes I'd be describing will be the only ones. So I can see why you might not want to do a long remote debugging session -- so I'll definitely wait. I might look through the code a couple more times to make sure I'm not missing something obvious. |
|
So part of the rebase here was making changes to the Drop impl on the various transfer structs, so they checked both the DMA channel status as well as the SPI peripheral's idea of done. On transmit, especially, these are different - DMA is done when the 8-byte afifo gets filled, while we can't actually do anything with the peripheral until trans_done is set when the data is shifted all the way out onto the bus. I'm not sure if that will help though. The transmit side isn't the issue anyway, or at least not in the LA sample from before - the bus is doing the right thing, just the data reception isn't working right. ... Hmm, there's an idea. Maybe clearing the afifos needs to be split as well, around the start bit being set. Maybe the one direction afifo needs to be cleared before starting, as today, but the other needs to be cleared after, as before the "split" commits I added. Maybe DMA is reading old data out of those afifos when it first starts up. I'll see what I can do here - not sure if you were planning to test anything at all here, but don't do it yet if so. Thank you very much again :) |
|
Ok, so in the "received by master" in the most recent run, I see a pattern that I missed before. Maybe the issue isn't in the afifo resets, although I still wonder. Because the code in the slave is filling the tx buffer with 255..1 sequences (not 255..0), I should not expect the end of the 32000 byte buffer to be on a byte rollover value (either 255 or 0). And indeed, in the very first transmission, I see the buffer ends with 87, 86, 85, 1, fe (where the 1 looks like the transmission index from the second iteration). So if the last two bytes had been transmitted (or received?), that buffer would end in 85, 84, 83. And ff to 83, inclusive, is 125 - matching, exactly, the value of 32000 mod 255. So that first transmission is fine, other than missing the last two bytes. I'm not entirely sure why the code logged the first two bytes of the next transmission, but the DMA engine on the master side will save unused chunks of rx buffer and fill them the next time around, so maybe that's happening. Well, except no, because the master side logs before it starts to fill the rx buffer again. So maybe that's from the slave side; maybe it moved on to the next descriptor in the DMA setup somehow, or maybe it managed to reset the DMA engine and re register the buffer in time for the master to get those two bytes when it sent the last 16 clocks. The other thing is, it's not always two bytes. It's two the first time around, but it's those two plus five more, then plus five more, etc., each subsequent time around. Hmm. On the slave side, the first transmission is all wonky, but the second starts with the three expected closing bytes of the first transmission - 7b, 7c, 7d (but 7d got overwritten with the transmission index 0). That's the same 125 bytes as on the send side, but we're off by three this time instead of two. Future transmissions have the same off by five pattern going on. I don't see anything obvious in the code that would generate an off by five error, but maybe it's a consistent timing issue or something. Maybe there's something wrong in how the code decides the transfers are done, still. |
|
Hmm, at least some of that last comment is nonsense - there's a 250ms delay between when the transfer says it's done, and when the code sets up another transfer. So it's not likely the case that the slave side is setting up another transfer before the master side sends the last 16 clocks. So then where are those bytes coming from... I might just have to dump out the DMA descriptors to see what's happening... That will of course require access to the hardware, so not until at least next week. Or maybe I could have it flip gpio lines when certain events happen, and see about that timing. It's still odd that the SPI bus has all the right bits being sent in the right order too, though. Hopefully this all makes sense soon. |
|
I still haven't been able to make much progress here. I did manage to get two dev boards connected together, and the right data is definitely being sent on both the miso and mosi lines. So I think something is off on the receive side only, but I still haven't been able to put a lot of effort into figuring out what yet. |
|
There we go, that seems helpful. On the first transmission, everything is fine except that the receive side logs a bunch of zeros before the data starts. After adding a ton more logging, and trying several .fill() calls on the rx buffer before starting DMA, I finally realized today that there are unchanged bytes in there - and the total number of them is exactly 4092*3, i.e. the first three descriptors are not being filled with SPI data. But the first byte of each of these chunks was being set to 0 -- even if I change the master side to never send a 0 byte. I also tried changing the CHUNK_SIZE const to 124, and adding a bunch more descriptor u32s to match. It's always the first two or three descriptors that have a single zero byte added, regardless of the byte count. I wonder if this is related to the glitches I see on the chip select and clock lines when setting up the master ... perhaps the zero byte is some kind of timeout or error result, after only one bit gets shifted into the SPI peripheral's buffer. Or after no bits get shifted in, if the issue is the high->low->high glitch on the chip select line. And I'm surprised the recent change to report a Result<> from the wait method didn't show anything either ... if this is some kind of error, maybe there are more bits to check or something. That will need another read through the TRM. Not sure on the best way to fix it either, but I think I see what's going on now. Two or three descriptors just aren't getting used the first time around. |
|
Hmm. According to the esp32c3 datasheet, pages 20-21, in table 6 the default pins for SPI2 clock and chip select (which the SPI loopback example also configures) are gpio6 (mtck) and 10. And on page 22, in table 7, both of those lines are documented to have a 5 microsecond low level glitch at startup. When the master device resets and this glitch hits the listening slave device, I bet it's causing at least some of the issues. Fortunately SPI2 can run through the gpio matrix to any data lines. Next week when I'm back in front of the hardware I'll try to find time to rewire the bus so that so least cs is on a line that doesn't glitch at startup, and let's see if that makes any difference. This probably doesn't affect the single device example because it doesn't initialize the receive channel before reset is done so the glitches don't affect anything. Or actually, because its receive channel is not the same as an SPI slave (it's just doing a full duplex transfer), so nothing looks at chip select at all. But it would really be nice if there was a way to ignore zero byte transfers in the SPI DMA handling, to let it ignore these on its own. Because really, it should be robust to something like this. Sadly I don't see any way to do that, so if this is the problem, then changing the wiring might be the only fix |
|
I can't find any gpio that doesn't have those glitches at startup, even the ones that Espressif documents as not having the "G" note. All of them glitch low when the bootloader is running, before the SPI master code even starts. So I went back to the bit bang master, because within a single MCU it's possible to avoid starting the slave until after the glitches are done. (And in fact it's impossible to start it before the glitches, because they're at bootloader time.) This fixes all the problems with missing bytes and with the slave receive side losing bytes. But I had to fix the bit bang master, obviously. Turns out it had two bugs, one where it was iterating over a list of arrays instead of a list of integers (so it only spit out one bit per byte), and one where I got the default bit order of SPI backwards (so it went in lsb first mode instead of msb first). Fixing those bugs makes it work perfectly, with only one device. As the checks are all passing as well, I think this is ready for review again. If you think it looks reasonable, I'll see if all the intermediate commits are still needed, and drop the ones that aren't, then squash everything and rebase one more time. Thank you for waiting! |
|
Ok, I'm pretty sure the intermediate commits are needed (though most of them can be squashed). And I think this is ready for another review; I just rebased to the new head. |
| let transfer = spi.dma_transfer(send, receive).unwrap(); | ||
| // here we could do something else while DMA transfer is in progress | ||
| let mut i = 0; | ||
| let mut n = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we should rename this variable in this example for all other chips, too if we do it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - that was the plan when I copied the new example to the other chips' directories. In this function, i is being used for three different things now - this change at least drops that to 2.
|
Thank you! I might not get to review this today since because of the changes to DMA I should test all DMA related examples on all chips I understand you only have access to ESP32-C3 but it seems this is also intended for other targets - would be nice to have the example adapted for them - but you can wait doing that after the first tests on ESP32-C3 certainly |
|
BTW if anyone else wants to take a look at this ... go ahead - I haven't started reviewing this |
|
I was able to successfully test this on ESP32-C3 🚀 There are some minor nitpicks but overall looks really good to me! Also briefly testing other DMA related examples was successful |
|
Addressed your comments. Time to copy the example to all the other chips' directories? |
Nice! Yes - let's see if this will work for all targets. Would be great |
|
Ok, that passes the automatic build for everything - it doesn't include an example for the -c6-lp variant, though, since that looks very new in the codebase, and only has one blinky example so far. Hopefully it works on all of them. :) There might be register differences I'll need to fix up, but let's see. Thanks again! |
|
Unfortunately, no difference |
|
Still trying to figure out what else it might be... It seems likely to be related to the pdma (instead of gdma) peripheral in the esp32 and s2 ... but it's not clear where the difference is. Here's a question: Does the SPI loopback DMA example still work on the s2 chip with this change? I wonder if I broke pdma somehow. If it doesn't work here, I assume it does work at head, right? If the master does work, I'll start digging through the differences between this code and the master side. Maybe I'll start that anyway, actually... |
|
e.g. here vs |
|
Sigh, that's a pretty obvious thinko -- I'm incrementing i in the inner loop instead of n. It's also a pretty easy fix. I haven't had any time to dig into this recently; going to try a few more things here shortly though. I think the rebase above ought to fix the spi_loopback_dma example again though. |
|
What's the status on this? I'd really like to get it merged if possible, this PR has been open for ~4 months now. If you're not able to complete the work here I'm sure we can find somebody else to bring it across the finish line. |
|
I've been super distracted with a lot of other stuff lately. :( I just managed to rebase it tonight, but yeah, it's been several weeks since I really had time to dig into the -s2 device's issues. (And it seems the rebase is off somehow ... oh, right, I dropped the frequency arguments; at least that's relatively easy to fix.) What do you think about disabling this module on esp32 and esp32s2, dropping the conditionals for those chips in spi_slave.rs, and merging it as-is? Then, yeah, others can sort out the support for those two chips. That would work for me. |
I would be fine with that - we did similar things before |
|
OK, I'm making slow progress on that -- still need to go through spi_slave.rs and drop cfg checks. ...Or actually, maybe not -- leaving it as-is means whoever follows up has less to do. :) I'll still take another look through before another review, though. (I think maybe the DMA prepare_transfer split could be used by other peripherals too, and drop the compat fn that's still there now.) Then I'll still have to squash down to 3 commits before merging. |
|
That, I think, should be all - it gets everyone off the prepare_transfer fn, and deletes it, in favor of a prepare_transfer_without_start. I could go back and rename the fn again, but my concern is that leaving it as-is wouldn't let others know the contract had changed. I'll need to test it again, I think, and will only get a chance in a couple days. But let's see what people think before too much more time passes. Thank you! |
|
The example still works, so the changes I made don't seem to affect functionality for the -c3 device. :) Is there anything else missing? |
|
I was able to test on all supported targets. It worked fine Generally, looks good to me. We don't have it in the code base and in optlevel 0 it produces slightly more code than calling the two functions like @jessebraham what do you think about that? |
This setup allows registering buffers for future transactions the master does (lowering cs, toggling sclk, and raising cs again). The transfer struct returned from the registration API will complete its wait() or return true from is_done() after cs has been raised. Copied from spi.rs, so most of the changes are deleting code that handles e.g. segmented transfers or synchronous operations. Fix non-c3 devices' builds
Ensure the API "feels" right. Since there's no way to route GPIOs to other peripherals, we choose four other wires and bit-bang SPI for the master side, relying on the person running the example to connect the bus. This way we ensure the slave code works, since we created the master ourselves. Also, it's not really possible to use a second ESP device as the master anyway: all the digital lines have glitches on startup, and those glitches cause the slave's DMA engine to skip descriptors (it thinks they're intended CS indicators); this causes it to lose data. Then, fix the bitbang master (recording the progression here) - When bitbanging, iterate the bits by "for _ in 0..8", instead of the broken "for _ in [0..8]". The latter only runs the iteration once, since there's only one list given ... and because the code uses _ instead of a real loop variable, type checking didn't save us. - When bitbanging, send the bits out (and read them in) MSB first, since that's actually how we have the slave configured.
The first does everything but write to the start bit and check for an error. The second does those. We need 2 fns because the SPI slave needs to start the transfer only after resetting the various afifo hardware components (if it starts the transfer before, the first 8 bytes will be lost when that reset happens). Use the split fns everywhere. Also split flush(). It needs to be pollable, so split it into one fn that polls and one that waits until the poll returns clear. Also call the poll fn from the is_done() fn, so we don't trample in-progress transfers.
This way we can tell if it's ever touching certain bytes - 0xff is never added to the master transmit buffer. While I'm changing this, make the slave tx buffer never contain 0xff either (go from 254 to 0).
|
Rebased onto head, and fixed the pins. I think the and_then construction reads better than the double question mark ("do this and then do that" makes sense to me) - but if opt-level zero is important, let me know and I'll switch it back. I'm sure the difference is just that the callback isn't being inlined. Thanks! |
|
I guess opt-level 0 is not really important - you can see the difference in https://godbolt.org/ ... the difference is also not huge. Thanks |
|
Totally fair. I'll change it back if desired; let me know. |
|
@jessebraham @MabezDev any opinions on the usage of the |
jessebraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have really not put any energy into code size or consistency, so no reason to start now. Makes no difference to me, so LGTM.
bjoernQ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Duplicate spi to spi_slave * Restore spi * Add barebones SPI slave mode, DMA only. This setup allows registering buffers for future transactions the master does (lowering cs, toggling sclk, and raising cs again). The transfer struct returned from the registration API will complete its wait() or return true from is_done() after cs has been raised. Copied from spi.rs, so most of the changes are deleting code that handles e.g. segmented transfers or synchronous operations. Fix non-c3 devices' builds * Limit spi_slave to non-pdma devices * SPI slave DMA example Ensure the API "feels" right. Since there's no way to route GPIOs to other peripherals, we choose four other wires and bit-bang SPI for the master side, relying on the person running the example to connect the bus. This way we ensure the slave code works, since we created the master ourselves. Also, it's not really possible to use a second ESP device as the master anyway: all the digital lines have glitches on startup, and those glitches cause the slave's DMA engine to skip descriptors (it thinks they're intended CS indicators); this causes it to lose data. Then, fix the bitbang master (recording the progression here) - When bitbanging, iterate the bits by "for _ in 0..8", instead of the broken "for _ in [0..8]". The latter only runs the iteration once, since there's only one list given ... and because the code uses _ instead of a real loop variable, type checking didn't save us. - When bitbanging, send the bits out (and read them in) MSB first, since that's actually how we have the slave configured. * Add changelog entry * Split DMA prepare_transfer into two fns. The first does everything but write to the start bit and check for an error. The second does those. We need 2 fns because the SPI slave needs to start the transfer only after resetting the various afifo hardware components (if it starts the transfer before, the first 8 bytes will be lost when that reset happens). Use the split fns everywhere. Also split flush(). It needs to be pollable, so split it into one fn that polls and one that waits until the poll returns clear. Also call the poll fn from the is_done() fn, so we don't trample in-progress transfers. * Make example code fill rx buffer before transfer This way we can tell if it's ever touching certain bytes - 0xff is never added to the master transmit buffer. While I'm changing this, make the slave tx buffer never contain 0xff either (go from 254 to 0). --------- Co-authored-by: Jesse Braham <[email protected]>
Basic SPI slave support, starting to implement #469.
DMA only, as in slave mode the clock is controlled externally, so there's no way to force a send or receive. So the API shouldn't really be "send this set of bytes", but rather "here's a set of bytes to send whenever the master wants to get around to asking for them". DMA fits the latter quite well.
Thank you!
Thank you for your contribution.
Please make sure that your submission includes the following:
Must
errorsorwarnings.cargo fmtwas run.CHANGELOG.mdin the proper section.Nice to have
Notes
I don't know for sure if the example works. It should, but it requires hardware mods to the board holding the -c3 chip in order to separate the SPI slave and master. If it's ok to try a remote debugging sequence like I did in PR #475, that works for me too, but if nobody wants to try that, i can mark this PR private and wait until I have access to the -c3 hardware again. That being said, I can't test on any other variant of the hardware at all, even after I wait.
I also don't think this is fully ready yet - its feature flags support non -c3 chips, but I haven't read through their TRMs to see if the code is doing the right thing (or even the documented thing). Inverting the SPI mode bits on the original ESP32 is the only thing I've seen so far, and that was a happy accident. I need to read the SPI chapters of several other reference manuals before it's truly ready - but the main structure is here, and that's the big thing I'd like to get reviewed at this point.
Thanks!