Particle Photon (STM32F205) DMA Control of GPIO pins

Introduction

For my light installations I work with different addressable LED strips like Dotstar (APA102) and WS2812 (Neopixels). You can control them by sending serial data (a bunch of 0’s and 1’s in a certain structure) to the data input. In Tim’s blog you can read for example how APA102 pixels are controlled.

The “problem” with sending that data is that the CPU of the microcontroller is busy with that. You can’t do any other things in the mean time. The longer your LED pixel strip is the longer the wait. Besides that different chipsets have different data rates as you can read in the FastLED chipset reference. The APA102’s can be updated super fast (although you need 2 wires, one for clock, one for data), but WS2812 chips are really slower.

One of the nice things of faster STM32 processors is that they have a DMA (Direct Memory Access) feature. This makes it possible to output data in memory to output pins without the need of the CPU. So while DMA functionality transfers data to your LEDstrip, you are able to render for example your effects with the CPU. Keep in mind that for most applications this is an overkill, you won’t need it. 

Their are some great DMA implementations already for LEDstrips, the best example is the OctoWS2811 library by Paul Stoffregen (Teensy). For the Particle Photon I’ve released a modified version of the Adafruit DotStar library, to control APA102 strips with DMA (thanks to the tips of Louis Beaudoin of the SmartMatrix library). This is possible, because for SPI transfers DMA is already implemented in the firmware. The limitation is that you have to use the SPI pins (A3/A5 and D2/D4) to connect your APA102/DotStar strips.

For a recent project I’m interested in controlling lights with DMX. Because DMX has slow dataspeed (compared with led pixel strips), the ability to send data in the background is kind of a need. SPI speed cannot set precisely enough, so I had to come with another solution. I’ve found several DMA examples, but without some knowledge of it’s inner workings, it’s pretty impossible to make it work. There is not a lot of explanation for people that don’t have an engineering background (like me), so it cost me a lot to understand this stuff. However I finally got a small prototype (just some blinking LEDs) to work, so I thought I might share my learnings for other engineering “noobs”. 

In this proof of concept I transfer data from an array to a GPIO port. In this case just two blinking LED’s on the A3 and A5 ports. Because of a circular double buffer, this will continue without any CPU intervention. 

DMA output concept on GIST

GPIO registers and other stuff

If you work with Arduino you are probably used to set pins to HIGH and LOW with functions like:

digitalWrite(13, HIGH);
digitalWrite(13, LOW);

If you need to send Serial data (bit banging), these functions are pretty slow (we are talking about microseconds now). They need to work on a lot of different devices (the Arduino script language Wiring is supported by many platforms). In order to manipulate a pin in a faster way we have to work on a lower level. Particle has some other functions like pinSetFast() and pinResetFast(). However this won’t work for DMA, you need to access the GPIO register through a port address directly. 

In embedded programming a pin is called a GPIO (General Purpose Input Output) pin. Those pins are connected to a port.

Particle Photon pinout

As you can see in this pinout diagram from the Particle Photon datasheet pin A0 is connected to port C pin5, and pin A3 is connected to port A, pin5. Internally a pin map is used to make sure that A0 is actually controlling pin 5 on port C (which is of course still labeled as A0 on the physical device). This pin map code is also a great resource to understand how GPIO ports and pins are used.

In order to set certain GPIO pins to HIGH and LOW you can write to the pin registers. ScruffR explains that in the Particle forum.

And for direct port writes you can use this:

    GPIOA->ODR = 0xAAAA;    // directly setting the respective pins HIGH or LOW
    GPIOA->BSRRL = 0xAAAA;  // only setting the pins with a set bit to HIGH (counter intuitive tho')
    GPIOA->BSRRH = 0x5555;  // only resetting the pins with a set bit to LOW (counter intuitive tho')

There would also be an atomic instruction to have BSRRH & BSRRL set at once, but _IO uint32_t BSRR is not declared for some reason, but this workaround might work (not yet tested tho’)

Each port has 16 pins (16 bit), so each bit sets or resets a certain pin.

ODR we don’t want to use, because it influences all the pins on the port (they will be reset automatically). Since I’d like to keep using other pins (like SPI, D7 etc) function just as normal, this is not an option. 

Julien Vanier came with the suggestion to use the BSRRL and BSRRH registers and use two DMA channels to control them. However BSRRL and BSRRH actually form together the 32 bit set/unset register named BSRR. So the “left” 16 bit part of the number will reset the pins, the “right” 16 bit part will set the pins. 

Using the BSRR register will save us a DMA channel (and Timers that have to be synced). One small problem is that BSRR is not declared in the Particle firmware. However the address of BSRRL points to the same location, so the workaround of ScruffR works.

Below (or in the Particle IDE here) an example of blinking the D7 LED with GPIO manipulation of BSRRL, BSRRH and through BSRR. 


// D7 is GPIO pin 13 on GPIOA. 
// https://github.com/particle-iot/firmware/blob/develop/hal/src/stm32f2xx/pinmap_hal.c#L80

void setup() {
    pinMode(D7, OUTPUT);
}

void loop() {
    
    //GPIOA->BSRR  = 0b00000000000000000010000000000000; // HIGH
    GPIOA->BSRRL = 0b0010000000000000; // HIGH
    
    delay(1000);
    
    //GPIOA->BSRR = 0b00100000000000000000000000000000; // LOW
    GPIOA->BSRRH = 0b0010000000000000; // LOW
    
    delay(1000);
    
    // actually we can make the "left" part by using the "right" BSSRL part and using a 
    // XOR, a pinMask and a bitshift of <<16 
    
    // 0b0010000000000000 HIGH
    // 0b0010000000000000 pinMask
    // ------------------ XOR
    // 0b0000000000000000
    
    // 0b0000000000000000 LOW
    // 0b0010000000000000 pinMask
    // ------------------ XOR
    // 0b0010000000000000
    
    // if we use BSRR we can use a 1 for HIGH and a 0 for LOW by using XOR and a pinMask. 
    uint16_t pinMaskD7 = 1<<13; // pin 13 on GPIOA (same as 0b0010000000000000)
    
    uint16_t valueHigh = 0b0010000000000000;
    uint16_t valueLow  = 0;
    
    uint32_t* GPIOA_BSRR = (uint32_t*)&GPIOA->BSRRL;
    
    //GPIOA->BSRR = valueHigh + ((valueHigh ^ pinMaskD7) << 16);
    *GPIOA_BSRR = valueHigh + ((valueHigh ^ pinMaskD7) << 16);
    
    delay(1000);
    
    //GPIOA->BSRR = valueLow + ((valueLow ^ pinMaskD7) << 16);
    *GPIOA_BSRR = valueLow + ((valueLow ^ pinMaskD7) << 16);
    
    delay(1000);
    
}

This GPIO tutorial for the STM32 Discovery board  helped me understand how GPIO registers and BSRR works.

Transfer data to output pins with DMA.

Like mentioned we would like to modify the GPIO registers with DMA. Each processor has it’s own architecture with Timers, GPIO’s and DMA channels/streams. If found different examples that used DMA to controlled pins or elements. A great working example for the Particle Photon was the Particle Speaker library by Julien Vanier. It uses DMA to control the DAC pin (A7 on the Photon) to play sinus waves. However my idea, was not to be limited to that pin. I wanted to be able to control any pin with DMA.

I started reverse engineering the code, understanding what different lines of code do. The biggest question was why DMA1 Stream5, channel7 is used. Because changing those numbers, would break the code. Well the answer can be find in Table 22 of the reference manual for the STM32F20x range (RM0033).

As you can see above. Channel 7 of DMA, Stream 5 is mapped to DAC1. You can also see that certain channels/streams are connected to a Timer. In other examples I’ve found that this timer needs to match with an DMA Stream/Channel.

Another important insight came from this post on StackOverflow:

There is a problem though, that DMA1 cannot access the AHB bus at all (see Fig. 1 or 2 in the Reference Manual), to which the GPIO registers are connected. Therefore we must use DMA2, and that leaves us with the advanced timers TIM1 or TIM8.

Since TIMER1 is used on the Particle for other functions, only TIMER8 can be used to control the DMA transfer.

Just as in the Speaker library a double circular buffer is used. This means that you can update one buffer, while another buffer is transferred. This makes it possible to transmit DMX on a fixed frame rate in the background. 

Like mentioned at the beginning of this post you can check the proof of concept on GIST. It works with GPIOA (A4,A5,A6,A7,D5,D6,D7) but you can also select GPIOC (A0, A1, A30 or GPIOB (D0,D1,D2,D3,D4). 

DMA output concept on GIST

#include "Particle.h"

// RM0033 MANUAL - Table 23 / Figure 1 System architecture
// Photon is STM32F205
// Only DMA2 is connected with GPIO ports https://stackoverflow.com/questions/46613053/pwm-dma-to-a-whole-gpio
// GPIO BSSRL/BSSRH/BSSR http://hertaville.com/stm32f0-gpio-tutorial-part-1.html
// DMA_Mode_Circular https://github.com/monkbroc/particle-speaker
// Ulrich Radig OctoArtnetNode https://www.ulrichradig.de/home/index.php/dmx/8-kanal-art-net
// Thanks to Julien Vanier for the idea of BSRR manipulation.

// The timers connected to APB2 are clocked from TIMxCLK up to 120 MHz
// In case of the Photon: TIM1, TIM8.

//SYSTEM_MODE(MANUAL); // only use this when you build local

// D7 is GPIO pin 13 on GPIOA.
uint16_t pinMask = 1<<13; // pin 13 on GPIOA (same as 0b0010000000000000)

uint16_t bufferSize = 4;
uint16_t blink_turnhigh_buffer[4];
uint32_t blink_bsrr_buffer0[4];
uint32_t blink_bsrr_buffer1[4]; // double buffer

void timerInit();
void dmaInit();

void setup() { // Put setup code here to run once

    delay(2000);
    Serial.begin(57600);

    timerInit();
    dmaInit();

    pinMode(D7, OUTPUT);

    //WiFi.off();

    // D7 is GPIO pin 13 on GPIOA.
    // of course you can modify multiple pins if you want
    // for example 0b0000000010100000 would turn on A3 and A5 (don't forget to modify the pinMask);
    blink_turnhigh_buffer[0] = 1<<13; //(same as 0b0010000000000000)
    blink_turnhigh_buffer[1] = 0;
    blink_turnhigh_buffer[2] = 1<<13;
    blink_turnhigh_buffer[3] = 0;

    // We use a XOR operation to mask and make the turn to LOW register low
    // (Done by sending a one. )
    for(int i = 0; i < 4; i++) {

      blink_bsrr_buffer0[i] = blink_turnhigh_buffer[i] + ((blink_turnhigh_buffer[i] ^ pinMask) << 16);
      blink_bsrr_buffer1[i] = blink_bsrr_buffer0[i]; // double buffer

    }
}

void loop() {

}

void timerInit (void) {

  // https://github.com/pkourany/SparkIntervalTimer/blob/master/src/SparkIntervalTimer.cpp
  // tryout with a slow timer. 2Hz (so each number is 0.5ms)
  //const uint16_t SIT_PRESCALERu = (uint16_t)(SYSCORECLOCK / 1000000UL) - 1;	//To get TIM counter clock = 1MHz
  const uint16_t SIT_PRESCALERm = (uint16_t)(SystemCoreClock / 2000UL) - 1;	  //To get TIM counter clock = 2KHz

  TIM_TimeBaseInitTypeDef	TIM_TimeBaseStructure;

  TIM_DeInit(TIM8);

  RCC_APB2PeriphClockCmd(RCC_APB2Periph_TIM8, ENABLE);

  TIM_TimeBaseStructure.TIM_Prescaler = SIT_PRESCALERm;
  //TIM_TimeBaseStructure.TIM_Prescaler = SIT_PRESCALERu;
  TIM_TimeBaseStructure.TIM_CounterMode = TIM_CounterMode_Up;
  TIM_TimeBaseStructure.TIM_Period =  2000;
  TIM_TimeBaseStructure.TIM_ClockDivision = TIM_CKD_DIV1;
  TIM_TimeBaseStructure.TIM_RepetitionCounter = 0;

  TIM_TimeBaseInit(TIM8,&TIM_TimeBaseStructure);
  TIM_ClearFlag(TIM8,TIM_FLAG_Update);

  TIM_Cmd(TIM8, ENABLE);

}

void dmaInit(void) {

  // DMA2 only connects to GPIO ports...
  // DMA2 channel 7 stream 1 connects to TIM8_UP
  DMA_InitTypeDef DMA_InitStructure;

  // Clock enable
  RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_DMA2, ENABLE);

  DMA_Cmd(DMA2_Stream1, DISABLE);
  DMA_DeInit(DMA2_Stream1);

  DMA_StructInit(&DMA_InitStructure);

  DMA_InitStructure.DMA_Channel = DMA_Channel_7;
  DMA_InitStructure.DMA_PeripheralBaseAddr = ((uint32_t)&(GPIOA->BSRRL));
  DMA_InitStructure.DMA_Memory0BaseAddr = (uint32_t) blink_bsrr_buffer0;
  DMA_InitStructure.DMA_BufferSize = bufferSize;

  DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;
  DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Word;

  DMA_InitStructure.DMA_DIR = DMA_DIR_MemoryToPeripheral;
  DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
  DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;

  DMA_InitStructure.DMA_Mode = DMA_Mode_Circular;
  DMA_InitStructure.DMA_Priority = DMA_Priority_VeryHigh;

  /* Configure double buffering */
  DMA_DoubleBufferModeConfig(DMA2_Stream1, (uint32_t) blink_bsrr_buffer1, DMA_Memory_1);
  DMA_DoubleBufferModeCmd(DMA2_Stream1, ENABLE);

  DMA_Init(DMA2_Stream1, &DMA_InitStructure);

  DMA_Cmd(DMA2_Stream1, ENABLE);

  // DMA-Timer8 enable
  TIM_DMACmd(TIM8,TIM_DMA_Update,ENABLE);

}