Author |
Message |
blue hell
Site Admin
Joined: Apr 03, 2004 Posts: 24079 Location: The Netherlands, Enschede
Audio files: 278
G2 patch files: 320
|
Posted: Sun Dec 10, 2017 5:22 am Post subject:
|
|
|
Delete, why?
They seem interesting examples still of what can be done.
But it is all up to you :-) _________________ Jan
also .. could someone please turn down the thermostat a bit.
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed Dec 13, 2017 7:55 am Post subject:
|
|
|
My MIDI Karplus-Strong synth operates at 16 voices with the following features:
- 16 Voices of polyphony EDIT: now 32 voices
- Loop filter control
- Precision tuning across it's 8 octaves of range (8 octaves when bender is used at +12 / -12 semitones).
- Automatic sustain extension for higher notes
- Sustain extension for all notes controlled with a CC
- Pluck pulse filter control
- Pluck pulse length control as a ratio of wave length from 0.2 to 0.5
- Pitch bend up to +12 / -12, allows non symmetrical settings such as up 3 and down 2.
- Mix raw KS signal with wave shaper produced frequency doubler. This actually adds an octave to the range making it a 9 octave instrument. The frequency doubled signal may be used alone and it carries a very similar waveform to the source signal so it still sounds like what your timbre settings are.
- Three "voice" sine LFO modulated chorus effect with speed and depth controls
- An age (since pluck) and pitch create an estimate a rough amplitude for each voice playing. The largest number determines the voice that is stolen and it will more likely be the voice that has the least actual amplitude. This hides a lot of voice theft.
I will be posting some samples later. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
Last edited by JovianPyx on Tue Dec 19, 2017 2:43 pm; edited 1 time in total |
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Tue Dec 19, 2017 2:39 pm Post subject:
|
|
|
The synth now operates with 32 voices configured.
I'd had some problems with optimization flags and just now got it working with -O3 (previous best was -Og and -O1,-O2 and -O3 didn't work (that is, no sound, no MIDI LED).
Once I found the problem in my C code, -O3 began to work and then I was able to increase the voice count to 36. Backed off to 32 so that the MIDI controller can provide lower latency.
The .bin file is 56548 bytes. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed Jan 03, 2018 8:22 am Post subject:
|
|
|
I am back on the warpath regarding the use of ARM Cortex-M7 DSP instructions. I was digging through the ST stuff I downloaded and noticed HTML documentation of CMSIS drivers. There are library files (.lib) there as well. Looking at 06_DSP_theory.pdf, there are references to DSP instructions such as the MAC. I'm getting the impression that by using the proper library and it's functions, many DSP operations can be accelerated.
I will do some experiments with different simple DSP functions like linear interpolation or simple IIR filters and then examine the binary file to see if MAC instructions are actually found.
Is anyone else using these libraries - such as arm_cortexM7lfsp_math.lib (Little endian and Single Precision Floating Point Unit on Cortex-M7)?
I'll post back here with results of this. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed Jan 03, 2018 11:21 am Post subject:
|
|
|
I couldn't figure out how to use the .lib file with gcc, but I found libarm_cortexM7lfsp_math.a and was able to get a simple benchmark working to compare the performance of the standard sqrt function and the one supplied by the above library file. The standard sqrt is 19.5 times slower (optimization set to -O3), so there is a definite advantage to using this library. In a disassembly, I do find the vsqrt.f32 instruction. Next I will try a filter to see if MACs are used. This is again with a simple makefile project, so only what is supplied free by ST is needed. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Thu Jan 04, 2018 4:30 pm Post subject:
|
|
|
In looking through the CMSIS DSP library supplied, I didn't find much besides the fast square root useful. Either the vector dot product function or the linear interpolator function appeared to be able to do a low pass single pole IIR filter. I tried to use the vector dot product to do this and it turns out that writing it in C is about 4 times faster than the vector dot product DSP function. These functions are all written to process blocks of data (where I would like to process one sample at a time), so there is likely looping overhead that is bogging it down. Unfortunately, the source code doesn't appear to be available for the library.
Given this, I've taken a plunge at writing inline assembly language for a single pole low pass IIR filter. I've been able to do some simple things (change the value of two C variables) so I'd like to try the filter and see how that goes. If it works, I'll benchmark it against a C version and report back to this thread. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Mon Jan 08, 2018 6:57 am Post subject:
|
|
|
I've managed to get an inline assembly language filter working, but it didn't go as I had expected. First, the assembly code is slower than the C code by a significant amount (about double the time) and the output value doesn't exactly track with the C version. If I run the filter (to keep re-filtering without clearing the memory z) 10,000 times, it will start out giving identical results, but eventually a small difference crops up. I tried 4 methods and all do the same thing.
1) use vmul and vmla
2) use vmul and vfma
3) use two vmul and a vadd
4) use vmul and vmla, but get C variable values with mov and vmov
This is the vmul and vmla code:
Code: |
asm
(
"vmul.f32 s0, %[__z], %[_b1]" "\n\t"
"vmla.f32 s0, %[_in], %[_a0]" "\n\t"
"vmov.f32 %[_z], s0" "\n\t"
: [_z] "=t" (z)
: [__z] "t" (z), [_a0] "t" (a0) , [_b1] "t" (b1), [_in] "t" (in)
: "s0"
);
|
I tried vfma because it does rounding, but the results were the same. All the methods I tried were slower than the straight C one way or the other which I believe to be caused by inefficient register loading when perhaps it's not needed. I tried all levels of optimization.
I am probably doing some wrong/bad newbie things, but at this point, these simple things seem better done in C because they are faster and C is a lot easier to read than assembly. I found that in depth information (such as complete examples) is rather scarce. I found no examples on using floating point except for a single instruction sqrt function I found in arm_math.h. From that example, I was able to create the above code.
I'm sure I will look at this again, but presently, I'm getting satisfactory performance now, so that will be more of a leisure activity. One thing I will do is to look carefully at the code that gets created for the C statement Code: | z = a0 * in + b1 * z; |
_________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
emeb
Joined: Dec 16, 2008 Posts: 35 Location: Arizona
|
Posted: Mon Jan 08, 2018 8:51 am Post subject:
|
|
|
JovianPyx wrote: | First, the assembly code is slower than the C code by a significant amount |
Yeah - I had the same experience when trying to do assembly on Cortex-M. I suspect that the compiler knows things about pipelining that I don't, so when I tried to add stuff it screwed up the flow.
About the only thing I do with assembly is force the use of the ssat and usat instructions - those are quite handy and the compiler doesn't seem to know about them (or else I don't know the proper inference syntax).
These days I mostly just relax and trust the compiler. |
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Mon Jan 08, 2018 9:14 am Post subject:
|
|
|
emeb wrote: | These days I mostly just relax and trust the compiler. |
Well, that is the plan going forward as I discovered something today that eliminates the need for inline assembly anyway.
Some experiments I did required that I dump a disassembly and check the generated code to see what is in it. To my surprise, I found that a simple filter like:
Code: | z = a0 * in + b1 * z; |
can cause gcc to use DSP instructions like vfma.f32, but only under certain conditions:
1) The filter must be inside of a loop
2) Optimization must be set to either -O2 or -O3.
There may be other conditions as well, but once I did that, I started seeing vfma.f32 in the test code. I then went back and disassembled my KS32 synthesizer code (which contains many such filters) and I see vfma.f32 everywhere in it. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Sat Mar 17, 2018 8:15 am Post subject:
|
|
|
Yesterday I was able to write an SDRAM diagnostic. It was easier than I expected. The diagnostic uses random data from a 32 bit LFSR (because the C functions for random return signed values that are only positive so bit 31 is never set) and access to the SDRAM is done with a pointer. All 8 megabytes are accessible. Rough timing tests show that the SDRAM can do about 16 million mixed read/write transfers per second. The only small drawback is that to use SDRAM, the system clock must be set at 200 MHz (I believe slower will also work, but I don't know the lower limit). Seems it might be fast enough to work as a reverb. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Tue May 29, 2018 8:12 am Post subject:
|
|
|
I learned something major yesterday that may be of interest to others who are using this dev board.
On the board is a rather nice looking large color touch screen display. Most or all of the demo programs that come as free downloads use the display, but there is a caveat. The LCD touch screen does not have it's own pixel RAM. This is very important because it requires the CPU be involved to support the display (not just setting pixels, the display doesn't work unless the CPU is constantly pumping data into it).
The display driver needs a screen buffer which for this board is in SDRAM. Note that SDRAM is slow compared to SRAM. The driver supplies new data to the display as is needed.
I discovered this while trying to work out how to create a special effect with the board, that is, line-in ---> DSP process ---> line-out. Looking at the driver code, I noticed that there is an initialize function that sets the CODEC for simultaneous input and output. However, in the code, there is a comment about selecting an input that only MEMS microphone 2 is supported. This appears to be either lazy or overworked engineers who had trouble doing useful things with line-in and just whacked that in there to boot the code out the door. In fact, none of the projects use line-in as an audio source, all of them use the MEMS digital microphones (which use sample rates far less than 44.1 kHz).
In fact, there is no such hardware caveat. On github, I found a project that uses the Discovery board with line-in and line-out as a bandpass 700 Hz filter (something about morse code). I was excited about it, downloaded and after a lot of head scratching made it compile. Once I ran it, I found that it works. There is a "pass-through" selection so that no filter is involved and it does pass audio from line-in to line-out. Within the github code, I found modifications to the CODEC driver that allow line in to be used.
And then the other shoe drops - I wanted to know at what sample rate it runs and it turns out to be only 16 kHz. Pretty poor for an audio effect. I modified it to test 96 kHz (my target sample rate) and got a DMA error. Same with 48 kHz. I know I could probably make it work without those errors using a larger buffer, but that would be pointless as my effect needs to be real time with low latency. I did wonder why it was so slow and I realized that there's no connection between the SDRAM where the pixels are stored and the LCD display. This means that the CPU is shuttling data from SDRAM to the display and that takes up a significant amount of it's CPU time.
At the same time I wondered how I'm able to get 32 voices of Karplus-Strong with advanced features. The answer is that I'm not using the display. It may be possible to use the display, but not while the audio is working. So a button could be used to switch it on and off. On to select stuff, off to make music. OR a display with it's own internal RAM could be substituted.
I hope this helps someone. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
BobTheDog
Joined: Feb 28, 2005 Posts: 4044 Location: England
Audio files: 32
G2 patch files: 15
|
Posted: Wed May 30, 2018 7:34 am Post subject:
|
|
|
Doesn't the screen stuff use DMA?
I have been messing around with an i.MX RT1050 (M7 600mhz) development board that has a screen like this and everything happens in the background using DMA. |
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed May 30, 2018 7:48 am Post subject:
|
|
|
Yes, AFAIK, there is a DMA transfer from SDRAM to the CPU SRAM, but there is no path directly from SDRAM to the LCD, so the CPU initiates the DMA transfer to get a certain amount of SDRAM data into SRAM and then has to stuff it into the LCD registers - not sure if that's also a DMA operation. I haven't looked that hard at the LCD code yet, but something about using the display is eating the CPU. I noticed that all of the demos that need audio in take it from the MEMs mics at low sample rates, but they all also use the display. This board also does the LCD work "in the background", but 600 MHz is higher than 200 MHz at least. The answer is likely in the LCD driver code, but it's not what I'd call "bed time reading". Similar display is one thing, but unknown is whether the way it's wired makes it more efficient. Apparently, in the Discovery design, there was a pin defecit which is why the Discovery SDRAM allows access to only half of it (due to a design decision to use only 16 of the 32 data lines).
EDIT ADD: Some perusing of the driver tells me that the LCD has a line buffer which must be updated/refreshed regularly from SDRAM. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
BobTheDog
Joined: Feb 28, 2005 Posts: 4044 Location: England
Audio files: 32
G2 patch files: 15
|
Posted: Wed May 30, 2018 11:15 am Post subject:
|
|
|
I must admit I thought the LCD controller did all the heavy lifting and the only cpu hit was writing to the frame buffer in ram.
Is that where you are seeing the cpu hit or are you seeing a general cpu hit even when not updating the frame buffer? |
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed May 30, 2018 11:26 am Post subject:
|
|
|
The line buffer updates (from SDRAM to CPU to LCD line buffer) seems to be the bottleneck.
I get great DSP performance when I don't enable the LCD at all.
Not sure of the exact frame rate, but I'd bet it's about 60 Hz.
The board has a demo that can play videos, so the frame rate isn't slow.
Not sure about this either, but perhaps it's possible to run the LCD at a lower frame rate. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
BobTheDog
Joined: Feb 28, 2005 Posts: 4044 Location: England
Audio files: 32
G2 patch files: 15
|
Posted: Wed May 30, 2018 11:31 am Post subject:
|
|
|
So there isn't an LCD controller? |
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed May 30, 2018 11:43 am Post subject:
|
|
|
There is some sort of controller inside the STM32F7.
These are the signals which go to and from LCD and the CPU's LCD interface.
LCD_R[0..7]
LCD_BL_CTRL
LCD_B[0..7]
LCD_G[0..7]
LCD_HSYNC
LCD_CLK
LCD_VSYNC
LCD_DISP
LCD_DE
LCD_RST
LCD_INT
LCD_SDA
LCD_SCL
The fact the Hsync and Vsync are there indicate to me that the CPU has to actively support the LCD. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
BobTheDog
Joined: Feb 28, 2005 Posts: 4044 Location: England
Audio files: 32
G2 patch files: 15
|
Posted: Wed May 30, 2018 12:13 pm Post subject:
|
|
|
The LCD controller will most likely be in the cpu and has access to these pins.
All I am wondering is if the driver/sample code you are using is to blame for the cpu hit rather than the actual cpu infrastructure?
My limited experience of the high level drivers/sample code on these sort of boards is that they seem in the main to be coded by people who may not know what they are doing!
I just went through this a bit with the 1050, the NXP USB audio device driver wouldn’t work with in and out streams concurrently, when I delved into the code you could see that whoever wrote it never thought that you might want to do this even though it is fairly standard fare. The lower level usb stack was totally different, written by people that knew exactly what they were doing. Once I changed the USB audio device code it all worked fine. |
|
Back to top
|
|
|
JovianPyx
Joined: Nov 20, 2007 Posts: 1988 Location: West Red Spot, Jupiter
Audio files: 224
|
Posted: Wed May 30, 2018 12:35 pm Post subject:
|
|
|
I haven't been able to drill down all the way to the bottom. If you look at the code, it's an obfuscated system of calls to calls to calls to calls and finding code that actually moves data seems illusive. So sure, it's possible it wasn't written as well as it could have been, but the fact remains that the CPU has to do the work of getting the data into the pixel line buffer which happens a few hundred times per frame which likely runs at 60 Hertz. It doesn't help that there is code in files like camera.c that the LCD needs even if the camera isn't used. Deeper digging shows that the DMA operation happens and the driver polls for it to complete.
So yeah, it's not the best written to take full advantage of DMA, however, the LCD's working paradigm is still inefficient. At the very least, it burdens the CPU with thousands of additional interrupts and the actual data transfer. The capabilities of the LCD display are beyond what my application needs and I'd prefer not to waste *any* clocks on such when I can get a display that owns the frame buffer and needs updates only for things that change, not the entire screen buffer. _________________ FPGA, dsPIC and Fatman Synth Stuff
Time flies like a banana. Fruit flies when you're having fun. BTW, Do these genes make my ass look fat? corruptio optimi pessima
|
|
Back to top
|
|
|
|