| Thanks for looking at that. I cycle counted as per the instruction sheet in the family manual's instruction timings. The frustrating thing I am finding is a lot of pipeline stalls, but I haven't managed to figure out how to get the algorithm to work with more parallel moves and no pipeline stalls. I probably just have to work away at it to see if there's a better way. I can see shortening it slightly by block processing the allpass but I don't see saving that many cycles. I'm still in a sample-by-sample way of thinking, but I don't see more than a cycle or two of savings here by eliminating reloading of coefficients because I'd just get more ALU stalls from the looks of it. If I were using SDRAM and caches it would certainly be a lot different as we've discussed before. Perhaps processing two allpasses in parallel using more instructions but doing two at once (interleaved) might be more efficient.
I can see an advantage of a VLIW architecture in terms of optimizing data flows through the pipeline registers. I actually could use a read of a register during the pipeline stall period for a useful function but the 56k doesn't allow that. But on the other hand, I've only spent a few days writing 56k code and I already have the audio and host interfaces, memory management, loop gains, and multitap summers tested and working, and the allpass is the one I'm battling now. The learning curve of the 56k is quite low, I've programmed in assembler on a 68k and the 56k is in the same line of thinking from a programmer's point of view.
I might be able to speed up the processor clock another 20 MHz - I'm at 98 MHz and it's a 120 MHz part, but then I'd have to go to two wait states on the SRAM and with four accesses here, it would be faster to stay at one wait state (the minimum) and run at 98 MHz. I am actually quite surprised that the data bus running at 100 MHz doesn't appear to get thrashy at all even on a two-layer PC board. But the bottom is uninterrupted copper, and that was a bear to lay out. Basically, the whole data and address bus - 19 address, 3 control, and 24 data bus lines all on one side of the board, with ground plane on the back. Perhaps I should try it at 120 MHz to check my timing margins.
-Dale |