Why Erlang is Relevant to the Internet of Things

In my previous blog post, I showed how to run Micropython on an ESP8266. Those instructions showed how to load code on the ESP8266 device, and how to connect the device to other services on the internet or on your local area network.

Python is an incredibly popular and productive programming language and development environment. As a replacement for the UNIX shell, it provides a comfortable and familiar environment for building structured programs that are easy to read and maintain, which is a welcome alternative to popular shell scripting environments on the UNIX platform. (I’m looking at you, bash). And Micropython builds on that tradition with a simple, easy to use interface for getting up to speed quickly with development on small, cheap micro controllers like the ESP8266. And what’s not to love about a read-eval-print loop, for quick and dirty hacking without long, tedious the build/flash/test rinse and repeat of development in C on these devices?

While python is a fun environment to glue programs together, and while Micropython forms a compelling environment for rapidly prototyping ideas on embedded devices like the ESP8266 or the ESP32, its support for concurrency is less than stellar. Python programs on the ESP platform must make extensive use of co-routines, in order for a program to perform more than one task “at a time”, such as, for example, servicing a web request, while taking a reading from a GPIO pin.

I put “at a time” in quotes, because in most cases, the kinds of device on which Micropython runs typically have only one physical CPU. So in order for a program on one of these controller to do more than one thing at a time, the underlying runtime (I hesitate to call it an “operating system”) must somehow interleave the instructions being executed so that no one process works indefinitely on a task, while other tasks never get executed at all.

The Micropython community has come together to port the asyncio framework, for co-operative multitasking using co-routines. A very helpful, detailed, and well-written tutorial can be found here, for anyone interested, and there may be still other development efforts out there to provide similar functionality for the Micropython platform.

While co-routines buy us the ability for our programs to multi-task, they come at an expense; they require that the developer actually follow the rules and not, say, implement a task that takes too much time. In other words, the kind of multi-tasking is co-operative; developers need to play nice with the runtime, in order to not starve other tasks of execution time. I personally think this is a recipe for disaster, and it is one of the reasons that systems like MSDOS and MacOS (prior to OS X) were summarily executed; they simply did not scale as the number of tasks grew on the machines.

I will also say that co-routines in python are just plain hard to understand and reason about. While the rules for when to use keywords like async and yield might be straightforward (after you have read the documentation -- numerous times), I found when writing a fairly substantial Micropython application to control a set of Neopixel LEDs, while also serving a REST API, synchronizing clocks with an NTP server, and performing other background tasks, I had to bury these usages into a generic framework, so that I could insulate myself from these usages. A small bug in any of these tasks (or their behavior in the face of an unpredictable environment in which they run) can potentially lock up the CPU, rendering the device inoperative and unresponsive. And for a device that may require service in the field, that kind of error can be catastrophic.

There does seem to be some ongoing work to support threading abstractions in Micropython, but the documentation states this feature is currently experimental and subject to change, so it’s not clear when, if ever, an alternative to co-operative multi-tasking via asyncio will be available.

Even if it does become available, I would argue that the concurrency model behind threading abstractions (with shared memory, mutexes, semaphores) and the problems they beget (race conditions, deadlocks) are simply not worth the trouble. The problem isn’t really the underlying runtime or the hardware on which it runs – it’s the programming and execution model, that leads to unnecessary complexity, confusion, and ultimately bugs that are hard or impossible to debug. Having done this professionally for 20 years or more, I have made just about every mistake you can possibly make with this programming model. And I have little confidence that I won’t continue to make the same mistakes.

Fortunately, there is an alternative. Instead of programming with threads, protecting sections of shared memory with mutexes, crossing your fingers, and hoping for the best, the “actor model” provides an alternative paradigm for concurrent programming. Under this model, program state is held by a collection of lightweight, independent “actors” which operate “in parallel” (again, scare quotes, to account for the possibility that actor execution may be interleaved on the same CPU), which communicate with one another via message passing, and which, most critically, are single threaded. This means that object state can only be mutated in a single-threaded context.

That latter point is important, and is often overlooked. Actors, like people (despite their best efforts), do one thing at a time. If my wife asks me a question while I am coding, I do the polite thing and stop coding, to answer her question. Trying to do both at the same time will lead to domestic strife, on the one hand, and poor code, on the other.

The same is true for actors. They are the ultimate single-threaded entities in a system. There may be one, two, or 200,000 actors in your system, running in parallel. But no one actor is executing on more than one CPU at a time.

I have been extremely fortunate in the past few years to be able to work in Erlang, a programming language that runs on the BEAM abstract machine (or virtual machine, for those of you who were introduced to virtual machines when Java came along). Erlang is a (mostly) faithful implementation of the actor model (though the designers of Erlang and the BEAM fell on the model out of necessity and by accident), as well as a (mostly) faithful implementation of the functional programming style, a programming style that both complements and is complemented by the actor model.

Note. Elixer is a Ruby-inspired programming language that also targets the BEAM, and I lump Erlang and Elixir into the same basic category; they are both languages that run on the BEAM. The choice of language is largely an aesthetic one; what is really relevant is the underlying concurrency model, which they both share.

Now I am not going to say that using the actor concurrency model, or Erlang or Elixer on the BEAM is going to solve all of your concurrency problems. Race conditions and deadlocks are as much of a problem in Erlang/Elixer as they are in Python, Java, or C++. What is different about these platforms is that because actors are single threaded, and because memory is not shared between actors, you never need to worry about concurrent access to your data structures. A single actor instance is the only entity that can interact with given structure in memory. If you are more comfortable thinking in object oriented programming idioms, it is the ultimate form of privacy protection for your objects; it’s analogous to making any functions that interact with the state of the object (i.e., member variables) private and synchronized, so that no two threads can operate on the same object instance at any moment. The only way to interact with an object’s state is to send it a message (typically a copy of a portion of your own state), and to let the object handle the message in the way it sees fit. This was the original idea of object oriented platforms, such as SmallTalk, and early texts on OOP often lapsed into this misleading terminology. C++, Java, and its descendents never adopted this paradigm.

So what does any of this have to do with programming on micro-controllers? I mean, Erlang/OTP and the BEAM ecosystem is known for implementation of robust, scalable server applications, the kind of software that runs on large multicore machines that reside in data centers or in the cloud. Micro-controllers are a long way away from that!

I think the BEAM is relevant to these IoT devices in two principle ways. For one, the BEAM is a pre-emptive multi-tasking runtime. When an “actor” is running on the BEAM, it is allowed to run a pre-defined number of “reductions” (think of them as instructions). If the actor reaches that limit, the underlying BEAM will pre-empt the actor, and allow another actor to proceed on the CPU. In this way, multiple actors can run on a single CPU, without requiring intervention on the part of the programmer. As Joe Armstrong famously said, “multi-tasking on a single CPU is just a form of scheduling”.

What makes Erlang/Elixer and the BEAM compelling is that code that is targeted for the BEAM can run on processors with one, two, or n-many cores without modification, and, if designed properly (which is not hard to do), can potentially scale linearly with the addition of CPUs to the application. (In fact, many Erlang/Elixer applications can run on multiple machines, making use of CPU cores on the network, not just on the local machine.)

This becomes even more interesting in the case of the ESP32 micro-controller, which has two execution cores. (It also has a third core, but that is used in low-power mode.) As micro-controllers become more sophisticated, their ability to make use of cores becomes more important to developers, and if an application can benefit from additional CPUs without any change to the binary or source code, then that makes the platform very compelling for anyone who wants to develop software for these devices for less bugs, with less headache, and ultimately, for less money.

Another way in which the BEAM is relevant to IoT is that many of these devices, such as the ESP8266 and ESP32, are implemented with a native wireless networking stack, making them easy (and cheap) to network together. And what better platform than OTP, with built-in support for distribution, and where the programming model on a single device extends naturally to multiple devices on a network? The BEAM is almost a natural fit for these kinds of devices, and it opens up possibilities and use cases only dreamed of in other runtime environments.

Yes, I do think Erlang is still relevant to IoT in the data center. There is no shortage of Erlang applications that provide messaging (e.g., RabbitMQ) and storage (e.g., Riak) for IoT devices bumping around on the back of a tractor in Iowa. But I also think Erlang, Elixer, and the rest of the BEAM ecosystem can have a seat at the IoT device table, for implementing robust and scalable applications that run on the devices, themselves.

In playing with Micropython, I had always thought that while Micropython is fun, wouldn’t it be cool if there were a port of the BEAM to a platform like the various ESP devices? Fortunately, the AtomVM GitHub project is just that; AtomVM is a port of the BEAM runtime, designed to run on embedded devices, like the ESP32. Currently, it is under active development, but it can already run a subset of the BEAM instruction set, allowing developers to get started with writing Erlang or Elixer programs on the ESP32.

I personally find this project extremely exciting, and I plan to contribute to this project in whatever way I can, as I think bringing the BEAM ecosystem into the IoT world provides a compelling alternative to other languages and platforms on these kinds of device, while also providing interesting use-cases for the Erlang/Elixer community.

In my next post, I will show you how to set up a basic development environment on a Mac (or Linux machine), so that you can run the “Hello World” of the IoT world, blinky.

Copyright (c) 2018 dushin.net This work is licensed under a Creative Commons Attribution 4.0 International License.

Comments

Because of the prevalence of automated spam on the internet, I turn off comments on my blog posts. If you have comments or questions about this, please send me email using first-name -at- last-name -dot- net, where you can find my first and last name in the About page of this blog.