Microboot - A Simple Bootloader

Monday, April 14 2014 @ 08:13 PM

In my previous post I alluded to some work I was doing on a bootloader for the ATtiny (and other chips). This has been progressing well so it seemed like a good time to start providing more details. This project is still a work in progress so nothing is completely finalised yet but any changes are going to be fairly minimal.

Overview

This work came out of two sets of requirements that wound up aligning with each other - the first was my desire for a bootloader to ease the development of some more ATtiny based projects, the second was to develop a bootloader for the prototyping boards I've been working on.

The Microboard system is designed to support a range of different MCU's - some of which have built in bootloaders (like the LPC1114), some that have well defined bootloaders and support tools available for them (like the ATmega series) and others that have no standardised bootloader protocol (like the PIC16F1827 and the PIC32MX series). I had already decided to write my own bootloader for Microchip PIC devices and, after some thought, it seemed like a good idea to implement this on the AVR chips as well rather than wind up with a different bootloader and set of tools for each chip. The end result will be a common bootloader protocol and tools for all chips except the LPC1114 (the built in bootloader is fine - implementing my own would just be redundant).

Because the AVR chips are so widely used there is no shortage of example code to refer to - it was the best target to start with. With a bit of work the same bootloader could be made to work on the ATtiny chips as well so it solves two problems at the same time. This is the route I have been heading down over the past few weeks.

One major goal I have to keep the bootloader as simple and as straight forward as possible - it's not meant to be the fastest or smallest available, just easy to use and easy to understand. The implementation comes in at less than 1K on any target platform and will communicate at 57.6 KBaud - fast enough for daily use without trying to break any speed records. It is not required to modify the EEPROM or fuse bytes on chips that have them, it's only purpose is to modify the program flash.

The remainder of this post documents the behaviour of the bootloader, the protocol used for communication and describes some of the utilities I've developed to work with it. Implementation details for specific processors will be described in future posts.

Entering the Bootloader

The bootloader is run whenever the MCU resets, at this stage it determines if it should enter bootloader mode or continue to the application program currently on the flash. The first versions I worked on initialised the serial port of the MCU and waiting for a certain period for any communications, if nothing arrived the application was started. This approach turned out to be very annoying - it caused an unnecessary delay on each power cycle and when you did want to transfer code it could be very difficult to start the communications within the window of opportunity available. It also put restrictions on what hardware you could attach to the serial pins - if it generated any input to the MCU the bootloader mode was entered - leaving you wondering why the application wasn't doing what you expected.

The current model is based on the behaviour of the LPC bootloader - a bootloader entry pin is checked, and if it is low the serial port is initialised and the bootloader code started. If it is not low the application is started immediately. Where possible the implementation uses an IO pin that has an internal weak pull-up resistor - eliminating the need for any external circuitry. There are still some restrictions on the use of that pin from the application code (it can't be low at reset for example) but they can be worked around very easily.

Once the bootloader is started it will configure the serial port for a baud rate of 57.6 KBaud, 8 bits per character, 1 stop bit and no parity (8N1). It will then listen on the serial port for incoming commands.

Bootloader Protocol

Debugging the bootloader

I've deliberately kept the protocol very simple to make it easier to process on the chip and easy to write host side tools to work with it. All information is transferred in printable ASCII characters with binary data in hexadecimal, this means that more data is transferred but makes it a lot simpler to debug. It also means that you can interact with the boot loader manually, dynamically writing data to arbitrary locations in memory.

Each packet starts with a command (when sent from the PC to the MCU) or a status indicator (from the MCU to the PC) which is optionally followed by a sequence of data bytes and finally terminated with the 'line feed' character. When data is sent it is a sequence of byte values in hexadecimal format. The data is verified with a 16 bit checksum which is the sum of all the individual bytes added to the seed value 0x5050 with overflow ignored. The Python code to generate the checksum for a list of values looks like this:

checksum = 0x5050
for val in data:
  checksum = (checksum + (val & 0xFF)) & 0xFFFF

There are four supported commands sent from the PC to the MCU. Commands can be sent in any order, no command depends on the result of another. Responses are always prefixed with the '+' character (to indicate success) or '-' to indicate failure. The description of each command is as follows ...

Query Command

This is generally the first command sent, it reports various information about the bootloader and can be used to determine that you are in fact talking to the bootloader. The format is simply the '?' character immediately followed by the new line character. The response is a list of byte values followed by a checksum. A typical transaction looks like:

> ?
< +101001015072

The response consists of four data bytes and a two byte checksum. The data bytes are:

Index Description
0 Protocol version. The current version is 1.0 and represented as 0x10 hex.
1 The size of each data line. This tells the host how many bytes of data should be placed in each write command (and how many bytes will be returned for a read command).
2 This value indicates the type of CPU - AVR, PIC or PIC32
3 The specific model of the CPU - ATmega8, ATtiny85, etc

Read Command

The read command is used to read the contents of the flash memory on the MCU and is expressed as the letter 'R', a 16 bit address in 4 hex digits and a 16 bit checksum. If the read is successful the return value will be the '+' character, a 16 bit address as 4 hex digits, a sequence of bytes where each byte is a two character hex value, the 16 bit checksum and finally the terminating new line character.

A typical sequence looks like the following:

> R00105060
< +00100EC00DC00CC00BC00AC009C008C011245622

The number of data bytes returned will be the value indicated by the query command.

Write Command

The write command is used to change the contents of the flash memory on the MCU and is expressed as the letter 'W', a 16 bit address in 4 hex digits, a sequence of bytes where each byte is a two character hex value, the 16 bit checksum and finally the terminating new line character. The number of data bytes in the command must be equal to the value indicated by the query command. If there is not enough data or too little the command will fail.

A typical sequence looks like the following:

> W0000FFCD15C014C013C012C011C010C00FC057DA
< +

The response to this command is always a simple success ('+') or fail ('-') acknowledgement followed by the new line character with no additional data.

Execute Command

The execute command causes the application program to be started. This can be used directly after uploading to start running the new application.

The command is represented by the exclamation mark character ('!') followed by a new line character. There is no response to this command as the bootloader will transfer control to the application and not return.

Utilities

A bootloader by itself is not very useful without the corresponding utilities to run on the PC to use it. For this bootloader I have written some basic utilities in Python to transfer data to and from the device. A basic description of these tools follows.

Reading the Flash - mbdump.py

The mbdump.py utility will read the entire contents of the flash memory from the chip and save it to an Intel HEX format file. The usage information for this utility is ...

mbdump.py - Microboot/Microboard Firmware Dump Utility
Copyright (c) 2014, The Garage Lab. All Rights Reserved.

Usage:

  mbdump.py options [filename]

Options:
  -d,--device name  Specify the expected device, eg: attiny85,atmega8. This
                    is required.
  -p,--port name    Specify the name of the serial port to use for communication.
                    If not specified the port /dev/ttyUSB0 will be used.
  --log             Log all communications to the file 'transfer.log'

If a filename is not specified the output will be saved in the file 'device'.hex,
eg atmega8.hex if the device is an atmega8.

Writing Code - mbflash.py

The mbflash.py utility will read an Intel HEX file and program the flash on the MCU with it's contents. The usage information for this utility is ...

mbflash.py - Microboot/Microboard System Flashing Utility
Copyright (c) 2014, The Garage Lab. All Rights Reserved.

Usage:

  mbflash.py options filename

Options:
  -d,--device name  Specify the expected device, eg: attiny85,atmega8. This
                    is required.
  -p,--port name    Specify the name of the serial port to use for communication.
                    If not specified the port /dev/ttyUSB0 will be used.
  --log             Log all communications to the file 'transfer.log'

A simple example would be ...

:::shell $ ./mbflash.py -d attiny85 blinky.hex

This code would program an ATtiny85 connected on port /dev/ttyUSB0 with the contents of the file blinky.hex.

The Support Library - microboot.py

Both of the utilities above use a common support library contained in the file microboot.py. I've designed this to be easily integrated into other applications if custom firmware updating is required (for example - combining a fixed firmware blob with some use generated binary data).

What's Next?

Breakout Boards

I've just completed the first version of the bootloader running on the ATtiny85 using the same single pin serial interface I used in the safety light project and I've built a small breakout board with the additional circuitry required for the serial interface. I'm going to spend a few more days testing the implementation before I push the repository to GitHub for public access. There will be a few more posts over the next few days giving more details of that specific implementation.


Bootloaders and Bricked AVRs

Wednesday, April 02 2014 @ 07:45 PM

The last project based around an ATtiny85 was pretty successful, I'm impressed with what you can squeeze out of the chips and I have a few more smaller projects that they would be perfect for as well.

One of the more frustrating aspects was having to physically move the chip from the circuit to the programmer every time I wanted to update the firmware - by the fourth iteration I was wishing very hard for some sort of serial bootloader. The ATtiny doesn't have a UART on board but the functionality can be implemented in software (and, with the help of a little bit of hardware, can be done on a single pin).

I thought that taking that idea, and going over the source code for some existing bootloaders for the ATtiny and ATmega, would give me enough information to write my own. Although it would probably be smarter to use an existing bootloader (or at least implement and existing bootloader protocol) I wanted to use the task as an opportunity to get more in depth with the AVR architecture as well as learn more about the boot-loading process. To make things a little more tricky I would like to implement the bootloader protocol I come up with on the PIC16F1872 and PIC32MX series of chips - all CPUs used in my Microboard form factor.

While I'm busy mucking around with bootloaders and fuse bits I run a fairly high risk of bricking the AVR - putting into a state that you can only get out of by using High Voltage Programming or HVP - something that my little USBasp programmer can't do. Luckily, I found this tool which will automatically fix the fuses on a range of AVR chips to allow ISP programming to work again . I'm going to build one up over the next few days before I get too heavily into testing the bootloader- an image of what the finished product should look like is on the left, I doubt mine is going to look that neat though.

I'll post an update on how well the device works once I have one built and running (I intend to deliberately brick some chips just to make sure it can clear them so I'm really hoping it works as advertised). The bootloader project will get it's own write up as well once I've tested it enough to be confident of it's reliability.


An ATtiny85 Based Safety Light

Sunday, March 30 2014 @ 05:32 PM

Finished Product

This project is a simple presence sensing night light for doors, stairs and other areas which could be dangerous or difficult to navigate in low lighting conditions. It is built around an ATtiny85 microcontroller and uses the head from a cheap LED torch as the lighting element. The ambient lighting is detected with a simple LDR and presence with a PIR motion sensor.

Using a microcontroller is probably overkill for this project and the firmware may seem a bit large for what it does (around 1.5K). I wanted to use the project to experiment with some other features as well as making a useful utility for the lab so it has been a little over-engineered as a result.

Some of the more interesting features of the project include:

  • Combining the motion and light sensing inputs on a single pin.
  • A single pin half duplex serial interface to communicate with MCU.
  • A very simple communications protocol which allows changing configuration values without needing to re-flash the chip.

Schematic

As well as those items it was a good way to teach myself to work with the timers, PWM outputs, analog inputs and EEPROM on the ATtiny directly from C or assembly code. All the files for the project (firmware source, schematic and 3D models for the casing) are available on GitHub under a Creative Commons Attribution-ShareAlike 4.0 International License, feel free to use and abuse it as you see fit.

The remainder of this article details the various features of the project.

Power Supply

To power the device I'm using a set of 4 AAA batteries and a 3.3V MCP1700 LDO linear regulator to provide power for the digital circuitry. The voltage level of the battery pack is monitoring through a voltage divider consisting of two 10K resistors. The idea is to provide some sort of indication as the batteries go flat so you know when to change them before they fall below a suitable operating voltage.

It would probably be possible to remove the regulator completely, the ATtiny will run with anything from 2.7V to 6.0V without problems. It would make calculating the current limiting resistor for the output LED a little more problematic (it would simply fade as the voltage, and therefore the driving current, dropped) and you would need a simple 3.3V zener diode on the TX pin of the serial interface to keep that voltage level down as well.

Motion and Light Sensing

To detect motion I'm using a fairly standard PIR (passive infrared) module that I picked up from eBay. This particular one does not work at 3.3V unfortunately and the voltage level of the pulse output is the same as the power voltage. I wound up driving it from the battery rather than from the 3.3V regulator and added a diode so the forward voltage drop will bring the voltage closer to 5V for a fully charged battery pack.

For sensing the light intensity I'm using a simple LDR (I got some from Jaycar) which forms one half of a voltage divider. This model has a resistance range of 48K to 148K in light with a 10M resistance in total darkness.

I put the LDR in parallel with a 10K resistor to limit the range of resistance and configured a voltage divider with another 10K resistor such that the voltage will range from 0 to a maximum of half the input voltage. Higher voltages represent darker conditions.

Motion Sensor

Because we are not concerned about the light conditions if there is no warm body present I use the output of the PIR as the input voltage for the divider. This means I can detect both conditions with a single analog input and simply check for a value greater than a certain threshold to determine whether to turn the light on or not.

On problem with this approach is that the measured values will trend downwards as the battery discharges. This can be handled in software though, because we also measure the battery voltage we know what level it is at and can adjust the reading from the motion sensor accordingly. The corresponding code in the firmware that does that is as follows:

/** Base value for full voltage
 *
 * This is the value expected for a fully charged 6V battery pack.
 */
#define BASE_LEVEL 0xE8

/** Read the analog inputs
 *
 * This function reads the analog inputs (battery voltage and motion sensor).
 * Because the PIR and LDR are being driven directly from the batter we adjust
 * that input to the current battery voltage (so readings remain consistent).
 */
static void readSensors() {
  uint8_t power = adcVoltage();
  uint8_t motion = adcMotion();
  // Adjust motion value according to current power level
  if(power<BASE_LEVEL)
    motion = motion + (BASE_LEVEL - power);
  else if(power>BASE_LEVEL)
    motion = motion - (power - BASE_LEVEL);
  // Update sensor values
  configWrite(STATE_POWER, power);
  configWrite(STATE_MOTION, motion);
  }

We now know if someone is sneaking around in the dark, the next step is to cast some light on them.

Illumination Control

For illumination I'm using the head assembly of a cheap LED torch. These have the current limiting resistors built in and give you a concave reflective mirror and protective lens as well - much easier than assembly your own array of super bright LED's. The one I'm using is available in most supermarkets in Australia for around $AU 5 - a price well worth paying for the benefits you get above building up something yourself.

This module is driven direct from the battery (through the same diode used for the PIR) and controlled through an NPN transistor. I measured the current flowing through the LED assembly while driven directly from it's original 4.5V battery pack at 60mA so it fits neatly under the 100mA limit of a BC547 signal transistor.

Half Duplex Serial (Single Pin)

This is one of the more interesting aspects of the design. I came across this implementation of a software UART running at half-duplex that only uses a single IO pin. The implementation is in assembly and only consumes 62 bytes of flash (and no RAM at all). Speeds of up to 115200 baud are supported.

Serial Interface

There are limitations of course - there is no buffering and you cannot receive data while you are sending (and vice-versa). Some supporting external circuitry is required which uses an NPN transistor to ensure data being sent only goes to the Tx line. Anything that comes in the Rx line will be echoed on the Tx though so client software will have to take this into account.

In this project I'm running the serial line at 57600 baud and I've exposed the Tx and Rx lines to a 6 pin header that can be used with a 3.3V FTDI cable to communicate with the device.

I've found that transmits (from the ATtiny to the host) are very reliable but receives (from the host to the ATtiny) can be a little unreliable. Part of this is due to the way the firmware checks for activity on the serial port which can miss part of the initial start bit for a character. A longer term solution might be to use an 'on-change' interrupt on the pin so the device can start processing the incoming byte as soon as the start bit is detected. In the meantime this can easily be worked around in software on the client side of the connection.

Configuration Settings

Apart from providing a useful debugging tool during development the ability to communicate over a serial port allows the device to be configured without having to modify and re-flash the firmware. To achieve this I moved all of the values that control the behaviour into an array of bytes and treat it as a virtual bank of registers.

This includes all the configuration values (trigger levels for low battery and light activation, flash rates for the LED and other properties) as well as state information (current readings for the voltage and motion sensor for example). The configuration values can be saved to EEPROM and will be loaded when the device powers up. If there are no values in the EEPROM a set of suitable defaults will be used instead.

The full set of available registers are described in the following table.

Register Index Description
CONFIG_FIRMWARE 0 Firmware version (read only).
CONFIG_TRIGGER 1 Trigger value for motion/light sensor.
CONFIG_COUNT 2 Number of sequential trigger readings required.
CONFIG_LOW_POWER 3 Low power detection level.
CONFIG_LIGHT_ON 4 Time to keep the light on (in seconds).
CONFIG_LIGHT_START 5 Starting PWM value for turning on light.
CONFIG_LIGHT_STEP 6 Step value for turning on/off light.
CONFIG_LED_ON 7 Duration (in 1/10th sec) to keep LED on.
CONFIG_LED_OFF 8 Duration (in 1/10th sec) to keep LED off.
CONFIG_LED_ON_LOW 9 Duration (in 1/10th sec) to keep LED on (LP).
CONFIG_LED_OFF_LOW 10 Duration (in 1/10th sec) to keep LED off (LP).
STATE_POWER 11 Current battery power reading.
STATE_MOTION 12 Current motion/brightness reading.
STATE_LIGHT 13 Seconds remaining before the light goes off.
STATE_COUNT 14 Current trigger level count.

Each of these registers can be read over the serial connection and configuration values can be set. This allows the behaviour of the device to be easily customised for the environment it is in and provides some basic monitoring. All that is needed is a simple communications protocol to support this and a client side tool to provide monitoring and configuration capabilities.

Communications Protocol

The protocol used to communicate with the device is deliberately simple, each packet contains a single 16 bit value which is used to describe a command (or response) with parameters. The packet format consists of a start character (!), a sequence of 4 printable hex characters for the value, a single printable hex character as a checksum and is terminated by a newline character. On the wire this comes to a total of 7 ASCII characters and is very easy to verify on the ATtiny without needing a lot of memory or code to do so.

The protocol is implemented in a ping/pong method - for each request sent to the device a single response will be returned. Failure to receive a response indicates a communication error or an invalid request. Here is what the code looks like to send a packet to the device and read a response:

def __send(self, value):
  """ Send a value as a packet
  """
  if self.serial is None:
    raise Exception("Attempting to send on an unopen port")
  # Retry until we get a response
  retries = RETRY_COUNT
  while retries <> 0:
    packet = "%c%04X%1X%c" % (
      CHAR_START,
      value,
      self.__checksum(value),
      CHAR_END
      )
    self.serial.write(packet)
    # Read back what we just sent (side effect of the half-duplex UART)
    self.serial.read(PACKET_LENGTH)
    # Read the return value and convert it
    response = self.serial.read(PACKET_LENGTH)
    if len(response) == PACKET_LENGTH:
      value = int(response[1:5], 16)
      return value
    # Wait and try again
    sleep(0.1)
    retries = retries - 1
  # Failed, raise an exception
  raise Exception("No response from device.")

As I mentioned earlier anything sent to the device will be echoed back to the client so the code above immediately reads back the command it just sent and discards it before looking for any response from the ATtiny. There is no error checking or validation of the response in the sample above, that needs to be added sometime in the future.

The supported commands that can be sent to the device are:

Command Hex Description
CMD_GET 1R00 Get the current value of register R.
CMD_SET 2RNN Set the value of register R to NN.
CMD_SAVE 3000 Save the configuration registers to EEPROM.

The response codes that the device can return are:

Command Hex Description
STATUS_OK 0000 The operation was successful.
STATUS_INF 1RNN The value of register R is NN.
STATUS_ERR FFFF An error occurred during the operation.

The STATUS_INF response is sent in reply to CMD_GET and CMD_SET commands, the STATUS_OK and STATUS_ERR commands will be sent in response to CMD_SAVE.

Configuration Tool

I wrote a small Python class to handle communication with the device using an FTDI serial cable. This allows me to quickly develop useful tools to help with debugging and configuration. One of the first tools I wrote was a simple monitoring program that dumps the sensor data to a comma-delimited text file for later analysis with a spreadsheet.

Configuration

I wrote a small GUI in Python (using GTK) to allow you to change the configuration values and keep an eye on the current state. I've found that the LDR's vary in behaviour so it is necessary to adjust the trigger level to match the behaviour of the component you are using.

Board Layout and Casing

The circuit is very simple so a simple small home made PCB will do the trick. I managed to come up with a single sided layout that only uses a single jumper wire. I only need three devices in total so it doesn't seem worth doing a two layer board and having it made up at a PCB fabrication service - etching that many boards by hand is not too onerous a task.

Casing

The casing was a little more problematic - I wanted to utilise my 3D printer to create a custom case rather than jury rig a standard project box for the purpose, it's one of the more complex objects I've designed in OpenSCAD. It was a bit fiddly to design but I came up with a working design after only a single prototype print. The design files are in the GitHub repository as well so you can have a look at them for yourself.

Summary

This has been an interesting project to work on (and immediately useful as well as being fun to do). The ATtiny chips are very capable devices and make for very simple, small circuit designs. Keeping the IO requirements down by sharing pin functionality or looking for hardware and software tricks to multiplex them is a great learning experience. An additional bonus is that it's very easy to move up to an ATmega chip if you do run out of IO.

The development cycle isn't as quick as using the Arduino environment (without a bootloader you have to manually pull the chip out of the circuit and into the programmer for each code update) but it's not overly complex either. I managed to put everything I need in the Makefile so it was a painless process to compile and flash the code when I made changes.

I have a few other projects in mind that would work well with an ATtiny as the main CPU so you'll definitely see some more articles on the site about them.

I hope you enjoyed this project write up, if you wind up building one for yourself or using the design for other purposes I'd love to hear about it. Once again all the design files for this project are available on GitHub under a Creative Commons Attribution-ShareAlike 4.0 International License so you are welcome to take them and use them as you see fit.


Conserving Memory on an AVR

Friday, March 21 2014 @ 06:23 PM

Both the ATtiny and ATmega CPU's have a very limited amount of RAM (512 bytes for the ATtiny45/85 and 1K for the ATmega8/168) and it's easy to hit the limit without some careful programming. I've been working on a project based around an ATtiny85 that barely fits into the 512 bytes available. While analysing the code I came across a few common (and not so common) tricks to help reduce the amount of RAM being used by your program so I thought I'd take the opportunity to share them here.

To determine how much memory you are using you can use the avr-size command (part of the avr-gcc suite) as follows:

avr-size --mcu=attiny85 --format=avr myfile.elf

The output of the command will not only tell you the amount of flash and RAM that is being used by your program it will also tell you what that is as a percentage of the available memory on the specified MCU. The output looks like this:

AVR Memory Usage
----------------
Device: attiny85

Program:    3644 bytes (44.5% Full)
(.text + .data + .bootloader)

Data:        372 bytes (72.7% Full)
(.data + .bss + .noinit)

I put an avr-size command in the link steps of the Makefile for my projects, this makes it to check if I am approaching the memory limits for the chip I'm using.

The remainder of this post looks at what things are placed into RAM and what options you have to minimise that memory usage.

Uninitialised global or static variables.

This is the most obvious. Any global variable (or static variable declared inside a function) will have space allocated for it in the RAM. A declaration like the following ...

uint8_t myArray[12]; // 12 bytes of uninitialised memory

... will set aside 12 bytes of memory.

There is not a lot you can do to avoid this apart from minimising your globals and statics. Reusing variables is a possible option (at the cost of making your code harder to understand) or re-evaluating if the variable really needs to be global.

If you are using large arrays (input buffers for example or as a space to build strings) you need to evaluate how big they really need to be and trim them down to the minimum required size.

To see what is being allocated in the uninitialised data segment (the .bss section) you can use the avr-objdump command:

avr-objdump -t myfile.elf | grep "\.bss"

For my sample program the output looks something like this:

0080009c l    d  .bss   00000000 .bss
0080019c l     O .bss   00000038 g_state
0080009c l     O .bss   00000100 g_framebuffer
008001d4 g       .bss   00000000 __bss_end
0080009c g       .bss   00000000 __bss_start

The first column gives the address, we can calculate the total size by looking at the difference in addresses for bss_start and bss_end. In this case it's 0x80009c to 0x8001d4 or 312 bytes. The second numerical column gives you the size of each variable - 0x100 (256) for g_framebuffer and 0x38 (56) for g_state. Note that this does not match up to the total RAM usage reported above - there is another segment (the initialised data segment or .data section) that gets added to it as well.

Initialised variables (including constant strings)

These are variables that are initialised with a value at compile time.

const char *cszMessage = "This is my message";
uint8_t myArray[] = { 0x01, 0x02, 0x03 };

A difference between these and uninitialised variables is that they take up space in both flash and RAM (before your program starts the values are copied from flash to RAM to ensure they are initialised with the right values).

This applies to const (read only) data as well. The AVR family is based on the Harvard Architecture which means that code memory and data memory are completely separate. To access data stored in code memory you need to use special instructions.

Luckily gcc-avr provides a set of macros to help deal with this situation, these are defined in the 'avr/pgmspace.h' header file. Essentially you need to declare your variables as being in program memory and then use specific macros to access them. A lookup table might be implemented as follows:

const uint8_t lookup[] PROGMEM = { 1, 2, 3, 4, 5, 7 };

inline uint8_t getEntry(int index) {
  return (uint8_t)pgm_read_byte_near(lookup + index);
  }

The macros are documented here and a Google search will lead to plenty of examples. The most common use of the pgmspace.h utilities is to keep constants strings in flash only, the runtime library provides an alternative set of string manipulation functions that will work with such strings.

You shouldn't limit yourself to strings though - any lookup tables or blocks of constant data are valuable candidates for optimisation.

'switch()' Statements

This one was a bit of a surprise to me and I only came across it while trying to track down a 'missing' block of 60 bytes in the initialised data segment that I couldn't match to anything I had declared.

To see what is being placed in the data segment you can use the avr-objdump command as described earlier. Here is an example that filters out items in the .data section:

avr-objdump -t myfile.elf | grep "\.data"

The output will look something like this:

00800060 l    d  .data  00000000 .data
00800060 l     O .data  0000003c CSWTCH.6
0080009c g       .data  00000000 __data_end
0080009c g       .data  00000000 _edata
00800060 g       .data  00000000 __data_start

You can see the CSWTCH.6 reference taking up 0x3c (60) bytes. This doesn't match up to any of my declared variables or constants so I was at a bit of a loss as to where it was coming from.

It turns out that when avr-gcc generates code for a switch() statement it creates a lookup table to optimise the jumps it needs to make to get to the appropriate code for the value being passed to the switch(). A lookup table is initialised data and therefore gets copied into RAM prior to running the program.

I would have expected the compiler to generate code to access this directly from flash (and I'm not alone, there is a bug report against this behaviour). If you really need to save the memory you will have to change the code from something like this ...

switch(c) {
  case 0:
    // Do something
    break;
  case 1:
    // Do something
    break;
  case 2:
    // Do something
    break;
  default:
    // Do something else
  }

... to something like this ..

if(c==0) {
  // Do something
  }
else if(c==1) {
  // Do something
  }
else if(c==2) {
  // Do something
else {
  // Do something else
  }

That will avoid having a lookup table generated and free up some precious RAM. In this case the code is still very readable (even if it may be a little slower to execute) so it's not a huge trade off.

The Stack

The stack is dynamically allocated and lives in the unused portion of RAM reported by avr-size. In the example output above I am explicitly allocating 372 bytes out of an available 512 which leaves 140 bytes for the stack.

Every time you call a function the return address will be pushed to the stack and occupy space in RAM until the called function is finished. If you have deeply nested function calls (or a recursive function) this can take up a lot of space.

The stack is also used for local variables and parameters in functions - when your function is called space will be allocated for any local variables it declares, when the function returns this memory is released again. From what I have seen the avr-gcc compiler does a good job of allocating registers for storing local variables and passing parameters though so this may not be a significant problem.

It is difficult to determine the maximum amount of stack space your program is going to require so you need to make sure there is a reasonable amount of unused RAM available. For an average program I would be dubious about anything less than 64 bytes.

Some steps you can take to minimise your stack usage include minimising the number of local variables in functions (reuse variables where possible rather than declaring a new one) and reducing the call depth (you could declare smaller functions as inline for example).

Stack overflow can be very difficult to diagnose as there is no stack checking done at runtime and using more memory than is available will corrupt the values of other variables meaning the symptoms may seem to be random failures. If you are having troubles this article describes a useful technique to see how much stack is actually being used.

Summary

So there you have it - a nice collection of simple tricks to use to optimise your memory usage as well as some useful tools you can use to help you determine what is being use where and by who. Be careful to avoid premature optimization though - save your memory optimisation until you need it.

In the case of this particular project I am trying to squeeze as much as I can into an ATtiny85. If you find yourself in a similar situation, or need to add features to an existing project these tips might come in very useful.


The Lab has Moved!

Wednesday, March 19 2014 @ 08:48 PM

The Garage Lab has now settled in to it's new location so I can get back to project again. Hopefully the drought of updates is finally broken.

The new location has a lot more space and lets me separate the different phases and types of projects more easily - I tend to work on multiple projects at once and now I can do that without them impacting on each other. I've already built a few things at the new premises (mostly space improvement projects) and the additional space has made it a lot easier.

I've also taken the opportunity provided by the move to clean up my internal network and the servers I run (I have a number of micro-PC's running XUbuntu in headless mode). The 'Garage' part of the lab is detached and is set lower than the main building leading to very poor WiFi reception - I used a pair of powerline adaptors from Netcomm which have solved the problem. I highly recommend them if you are facing similar issues - they provide a fast and reliable link between rooms without needing additional cabling.

One of the services I run is GitLab, a complete solution for hosting Git repositories on an internal network. It provides a web based UI that is very similar to GitHub complete with multiple users, project creation, issue tracking, a wiki per project and a complete set of graphs and other statistical information. It also provides you with full ssh and https access to your repositories.

Although sharing on GitHub is a better solution for open code there is a need for private repositories as well - I have started keeping a number of things (such as network configuration files and notes, small internal projects and custom single purpose projects) that it would either be a security risk to make public or would simply be clutter on the public GitHub service. Another reason to keep projects private, at least for a while, is to allow an incubation period as it is being developed and worked on - allowing at least a relatively stable version to be pushed to GitHub at the appropriate time.

If you have a spare machine to use as a server (a older laptop will do, even a Raspberry Pi if you don't have a lot of projects and don't expect top speed out of it) it is well worth setting up a GitLab service for your own use.

Another service I set up on all the machines was a CPU mining program for Dogecoin. I wanted to start experimenting with crypto-currencies just to get a feel for what they were like and how to work with the various software services surrounding them. BitCoin is hard to acquire without a significant cost involved (and the mining process requires some specialised, and expensive, hardware to be efficient now). I chose Dogecoin because of it's less serious nature and that I could actually mine some without spending a fortune on high end graphics cards or ASIC based mining solutions.

If you like, feel free to send some Doge my way - my wallet address is DCb85eXgP4J2r4F6R2JwkcyZr2YuLZUWBK. I'm not sure what to use Doge for yet, apart from tipping others, so any suggestions would be welcome.

Anyway, enough is set up that I can start catching up with my work queue so I have a busy few weeks ahead of me.


Recommended Sites

  • EEWeb

    Electrical Engineering News, Resources, and Community.

  • Sprites Mods

    A collection of projects from a very talented engineer.

  • Blondihacks

    A great collection of hacks and projects by Quinn Dunki.

  • Embedded Projects

    Embedded Projects from Around the Web.

Donate Doge

Like the site? Be kind and send a few Dogecoin my way.
DAqGZFkVj9meRgHMve9W5KQshdvZ3UCtGF