AtomVM Internals

What is an Abstract Machine?

AtomVM is an “abstract” or “virtual” machine, in the sense that it simulates, in software, what a physical machine would do when executing machine instructions. In a normal computing machine (e.g., a desktop computer), machine code instructions are generated by a tool called a compiler, allowing an application developer to write software in a high-level language (such as C). (In rare cases, application developers will write instructions in assembly code, which is closer to the actual machine instructions, but which still requires a translation step, called “assembly”, to translate the assembly code into actual machine code.) Machine code instructions are executed in hardware using the machine’s Central Processing Unit (CPU), which is specifically designed to efficiently execute machine instructions targeted for the specific machine architecture (e.g., Intel x86, ARM, Apple M-series, etc.) As a result, machine code instructions are typically tightly packed, encoded instructions that require minimum effort (on the part of the machine) to unpack an interpret. These a low level instructions unsuited for human interpretation, or at least for most humans.

AtomVM and virtual machines generally (including, for example, the Java Virtual Machine) perform a similar task, except that i) the instructions are not machine code instructions, but rather what are typically called “bytecode” or sometimes “opcode” instructions; and ii) the generated instructions are themselves executed by a runtime execution engine written in software, a so-called “virtual” or sometimes “abstract” machine. These bytecode instructions are generated by a compiler tailored specifically for the virtual machine. For example, the javac compiler is used to translate Java source code into Java VM bytecode, and the erlc compiler is used to translate Erlang source code into BEAM opcodes.

AtomVM is an abstract machine designed to implement the BEAM instruction set, the 170+ (and growing) set of virtual machine instructions implemented in the Erlang/OTP BEAM.

Note that there is no abstract specification of the BEAM abstract machine and instruction set. Instead, the BEAM implementation by the Erlang/OTP team is the definitive specification of its behavior.

At a high level, the AtomVM abstract machine is responsible for:

  • Loading and execution of the BEAM opcodes encoded in one or more BEAM files;

  • Managing calls to internal and external functions, handling return values, exceptions, and crashes;

  • Creation and destruction of Erlang “processes” within the AtomVM memory space, and communication between processes via message passing;

  • Memory management (allocation and reclamation) of memory associated with Erlang “processes”

  • Pre-emptive scheduling and interruption of Erlang “processes”

  • Execution of user-defined native code (Nifs and Ports)

  • Interfacing with the host operating system (or facsimile)

This document provides a description of the AtomVM abstract machine, including its architecture and the major components and data structures that form the system. It is intended for developers who want to get involved in bug fixing or implementing features for the VM, as well as for anyone interested in virtual machine internals targeted for BEAM-based languages, such as Erlang or Elixir.

AtomVM Data Structures

This section describes AtomVM internal data structures that are used to manage the load and runtime state of the virtual machine. Since AtomVM is written in C, this discussion will largely be in the context of native C data structures (i.e., structs). The descriptions will start at a fairly high level but drill down to some detail about the data structures, themselves. This narrative is important, because memory is limited on the target architectures for AtomVM (i.e., micro-controllers), and it is important to always be aware of how memory is organized and used in a way that is as space-efficient as possible.

The GlobalContext

We start with the top level data structure, the GlobalContext struct. This object is a singleton object (currently, and for the foreseeable future), and represents the root of all data structures in the virtual machine. It is in essence in 1..1 correspondence with instances of the virtual machine.

Note. Given the design of the system, it is theoretically possible to run multiple instances of the AtomVM in one process space. However, no current deployments make use of this capability.

In order to simplify the exposition of this structure, we break the fields of the structure into manageable subsets:

  • Process management – fields associated with the management of Erlang (lightweight) “processes”

  • Atoms management – fields associated with the storage of atoms

  • Module Management – fields associated with the loading of BEAM modules

  • Reference Counted Binaries – fields associated with the storage of binary data shared between processes

  • Other data structures

These subsets are described in more detail below.

Note. Not all fields of the GlobalContext structure are described in this document.

Process Management

As a BEAM implementation, AtomVM must be capable of spawning and managing the lifecycle of Erlang lightweight processes. Each of these processes is encapsulated in the Context structure, described in more detail in subsequent sections.

The GlobalContext structure maintains a list of running processes and contains the following fields for managing the running Erlang processes in the VM:

  • processes_table the list of all processes running in the system

  • waiting_processes the subset of processes that are waiting to run (e.g., waiting for a message or timeout condition). This set is the complement of the set of ready processes.

  • ready_processes the subset of processes that are ready to run. This set is the complement of the set of waiting processes.

Each of these fields are doubly-linked list (ring) structures, i.e, structs containing a prev and next pointer field. The Context data structure begins with two such structures, the first of which links the Context struct in the processes_table field, and the second of which is used for either the waiting_processes or the ready_processes field.

Note. The C programming language treats structures in memory as contiguous sequences of fields of given types. Structures have no hidden pramble data, such as you might find in C++ or who knows what in even higher level languages. The size of a struct, therefore, is determined simply by the size of the component fields.

The relationship between the GlobalContext fields that manage BEAM processes and the Context data structures that represent the processes, themselves, is illustrated in the following diagram:

GlobalContext Processes

Note. The Context data structure is described in more detail below.

Module Management

An Aside: What’s in a HashTable?

Modules

Contexts

Runtime Execution Loop

Module Loading

Function Calls and Return Values

Exception Handling

The Scheduler