AtomVM Internals

What is an Abstract Machine?

AtomVM is an “abstract” or “virtual” machine, in the sense that it simulates, in software, what a physical machine would do when executing machine instructions. In a normal computing machine (e.g., a desktop computer), machine code instructions are generated by a tool called a compiler, allowing an application developer to write software in a high-level language (such as C). (In rare cases, application developers will write instructions in assembly code, which is closer to the actual machine instructions, but which still requires a translation step, called “assembly”, to translate the assembly code into actual machine code.) Machine code instructions are executed in hardware using the machine’s Central Processing Unit (CPU), which is specifically designed to efficiently execute machine instructions targeted for the specific machine architecture (e.g., Intel x86, ARM, Apple M-series, etc.) As a result, machine code instructions are typically tightly packed, encoded instructions that require minimum effort (on the part of the machine) to unpack an interpret. These a low level instructions unsuited for human interpretation, or at least for most humans.

AtomVM and virtual machines generally (including, for example, the Java Virtual Machine) perform a similar task, except that i) the instructions are not machine code instructions, but rather what are typically called “bytecode” or sometimes “opcode” instructions; and ii) the generated instructions are themselves executed by a runtime execution engine written in software, a so-called “virtual” or sometimes “abstract” machine. These bytecode instructions are generated by a compiler tailored specifically for the virtual machine. For example, the javac compiler is used to translate Java source code into Java VM bytecode, and the erlc compiler is used to translate Erlang source code into BEAM opcodes.

AtomVM is an abstract machine designed to implement the BEAM instruction set, the 170+ (and growing) set of virtual machine instructions implemented in the Erlang/OTP BEAM.

Note that there is no abstract specification of the BEAM abstract machine and instruction set. Instead, the BEAM implementation by the Erlang/OTP team is the definitive specification of its behavior.

At a high level, the AtomVM abstract machine is responsible for:

  • Loading and execution of the BEAM opcodes encoded in one or more BEAM files;

  • Managing calls to internal and external functions, handling return values, exceptions, and crashes;

  • Creation and destruction of Erlang “processes” within the AtomVM memory space, and communication between processes via message passing;

  • Memory management (allocation and reclamation) of memory associated with Erlang “processes”

  • Pre-emptive scheduling and interruption of Erlang “processes”

  • Execution of user-defined native code (Nifs and Ports)

  • Interfacing with the host operating system (or facsimile)

This document provides a description of the AtomVM abstract machine, including its architecture and the major components and data structures that form the system. It is intended for developers who want to get involved in bug fixing or implementing features for the VM, as well as for anyone interested in virtual machine internals targeted for BEAM-based languages, such as Erlang or Elixir.

AtomVM Data Structures

This section describes AtomVM internal data structures that are used to manage the load and runtime state of the virtual machine. Since AtomVM is written in C, this discussion will largely be in the context of native C data structures (i.e., structs). The descriptions will start at a fairly high level but drill down to some detail about the data structures, themselves. This narrative is important, because memory is limited on the target architectures for AtomVM (i.e., micro-controllers), and it is important to always be aware of how memory is organized and used in a way that is as space-efficient as possible.

The GlobalContext

We start with the top level data structure, the GlobalContext struct. This object is a singleton object (currently, and for the foreseeable future), and represents the root of all data structures in the virtual machine. It is in essence in 1..1 correspondence with instances of the virtual machine.

Note. Given the design of the system, it is theoretically possible to run multiple instances of the AtomVM in one process space. However, no current deployments make use of this capability.

In order to simplify the exposition of this structure, we break the fields of the structure into manageable subsets:

  • Process management – fields associated with the management of Erlang (lightweight) “processes”

  • Atoms management – fields associated with the storage of atoms

  • Module Management – fields associated with the loading of BEAM modules

  • Reference Counted Binaries – fields associated with the storage of binary data shared between processes

  • Other data structures

These subsets are described in more detail below.

Note. Not all fields of the GlobalContext structure are described in this document.

Process Management

As a BEAM implementation, AtomVM must be capable of spawning and managing the lifecycle of Erlang lightweight processes. Each of these processes is encapsulated in the Context structure, described in more detail in subsequent sections.

The GlobalContext structure maintains a list of running processes and contains the following fields for managing the running Erlang processes in the VM:

  • processes_table the list of all processes running in the system

  • waiting_processes the subset of processes that are waiting to run (e.g., waiting for a message or timeout condition).

  • running_processes the subset of processes that are currently running.

  • ready_processes the subset of processes that are ready to run.

Processes are in either waiting_processes, running_processes or ready_processes. A running process can technically be moved to the ready list while running to signify that if it yields, it will be eligible for being run again, typically if it receives a message. Also, native handlers (ports) are never moved to the running_processes list but are in the waiting_processes list when they run (and can be moved to ready_processes list if they are made ready while running).

Each of these fields are doubly-linked list (ring) structures, i.e, structs containing a prev and next pointer field. The Context data structure begins with two such structures, the first of which links the Context struct in the processes_table field, and the second of which is used for either the waiting_processes, the ready_processes or the running_processes field.

Note. The C programming language treats structures in memory as contiguous sequences of fields of given types. Structures have no hidden pramble data, such as you might find in C++ or who knows what in even higher level languages. The size of a struct, therefore, is determined simply by the size of the component fields.

The relationship between the GlobalContext fields that manage BEAM processes and the Context data structures that represent the processes, themselves, is illustrated in the following diagram:

GlobalContext Processes

Note. The Context data structure is described in more detail below.

Module Management

An Aside: What’s in a HashTable?

Modules

Contexts

Runtime Execution Loop

Module Loading

Function Calls and Return Values

Exception Handling

The Scheduler

In SMP builds, AtomVM runs one scheduler thread per core. Scheduler threads are actually started on demand. The number of scheduler threads can be queried with erlang:system_info/1 and be modified with erlang:system_flag/2. All scheduler threads are considered equal and there is no notion of main thread except when shutting down (main thread is shut down last).

Each scheduler thread picks a ready process and execute it until it yields. Erlang processes yield when they are waiting (for a message) and after a number of reductions elapsed. Native processes yield when they are done consuming messages (when the handler returns).

Once a scheduler thread is done executing a process, if no other thread is waiting into sys_poll_events, it calls sys_poll_events with a timeout that correspond to the time to wait for next execution. If there are ready processes, the timeout is 0. If there is no ready process, this scheduler thread will wait into sys_poll_event and depending on the platform implementation, the CPU usage can drop.

If there already is one thread in sys_poll_events, other scheduler threads pick the next ready process and if there is none, wait. Other scheduler threads can also interrupt the wait in sys_poll_events if a process is made ready to run. They do so using platform function sys_signal.

Mailboxes and signals

Erlang processes receive messages in a mailbox. The mailbox is the interface with other processes.

When a sender process sends a message to a recipient process, the message is first enqueued into an outer mailbox. The recipient process eventually moves all messages from the outer mailbox to the inner mailbox. The reason for the inner and outer mailbox is to use lock-free data structures using atomic CAS operations.

Sometimes, Erlang processes need to query information from other processes but without sending a regular message, for example when using process_info/1,2 nif. This is handled by signals. Signals are special messages that are enqueued in the outer mailbox of a process. Signals are processed by the recipient process when regular messages from the outer mailbox are moved to the inner mailbox. Signal processing code is part of the main loop and transparent to recipient processes. Both native handlers and erlang processes can receive signals. Signals are also used to run specific operation on other processes that cannot be done from another thread. For example, signals are used to perform garbage collection on another process.

When an Erlang process calls a nif that requires such an information from another process such as process_info/1,2, the nif returns a special value and set the Trap flag on the calling process. The calling process is effectively blocked until the other process is scheduled and the information is sent back using another signal message. This mechanism can also be used by nifs that want to block until a condition is true.

Stacktraces

Stacktraces are computed from information gathered at load time from BEAM modules loaded into the application, together with information in the runtime stack that is maintained during the execution of a program. In addition, if a BEAM file contains a Line chunk, additional information is added to stack traces, including the file name (as defined at compile time), as well as the line number of a function call.

Note. Adding line information to a BEAM file adds non-trivial memory overhead to applications and should only be used when necessary (e.g., during the development process). For applications to make the best use of memory in tightly constrained environments, packagers should consider removing line information all together from BEAM files and rely instead on logging or other mechanisms for diagnosing problems in the field.

Newcomers to Erlang may find stacktraces slightly confusing, because some optimizations taken by the Erlang compiler and runtime can result in stack frames “missing” from stack traces. For example, tail-recursive function calls, as well as function calls that occur as the last expression in a function clause, don’t involve the creation of frames in the runtime stack, and consequently will not appear in a stacktrace.

Line Numbers

Including file and line number information in stacktraces adds considerable overhead to both the BEAM file data, as well as the memory consumed at module load time. The data structures used to track line numbers and file names are described below and are only created if the associated BEAM file contains a Line chunk.

The line-refs table

The line-refs table is an array of 16-bit integers, mapping line references (as they occur in BEAM instructions) to the actual line numbers in a file. (Internally, BEAM instructions do not reference line numbers directly, but instead are indirected through a line index). This table is stored on the Module structure.

This table is populated when the BEAM file is loaded. The table is created from information in the Line chunk in the BEAM file, if it exists. Note that if there is no Line chunk in a BEAM file, this table is not created.

The memory cost of this table is num_line_refs * 2 bytes, for each loaded module, or 0, if there is no Line chunk in the associated BEAM file.

The filenames table

The filenames table is a table of (usually only 1?) file name. This table maps filename indices to ModuleFilename structures, which is essentially a pointer and a length (of type size_t). This table generally only contains 1 entry, the file name of the Erlang source code module from which the BEAM file was generated. This table is stored on the Module structure.

Note that a ModuleFilename structure points to data directly in the Line chunk of the BEAM file. Therefore, for ports of AtomVM that memory-map BEAM file data (e.g., ESP32), the actual file name data does not consume any memory.

The memory cost of this table is num_filenames * sizeof(struct ModuleFilename), where struct ModuleFilename is a pointer and length, for each loaded module, or 0, if there is no Line chunk in the associated BEAM file.

The line-ref-offsets list

The line-ref-offsets list is a sequence of LineRefOffset structures, where each structure contains a ListHead (for list book-keeping), a 16-bit line-ref, and an unsigned integer value designating the code offset at which the line reference occurs in the code chunk of the BEAM file. This list is stored on the Module structure.

This list is populated at code load time. When a line reference is encountered during code loading, a LineRefOffset structure is allocated and added to the line-ref-offsets list. This list is used at a later time to find the line number at which a stack frame is called, in a manner described below.

The memory cost of this list is num_line_refs * sizeof(struct LineRefOffset), for each loaded module, or 0, if there is no Line chunk in the associated BEAM file.

Raw Stacktraces

AtomVM WebAssembly port

WebAssembly or Wasm port of AtomVM relies on Emscripten SDK and library. Even when SMP is disabled (with -DAVM_DISABLE_SMP=On), it uses pthread library to sleep when Erlang processes are not running (to not waste CPU cycles).

NodeJS environment build

The NodeJS environment build of this port is relatively straightforward, featuring NODERAWFS which means it can access files directly like node does.

Web environment build

The Web environment build of this port is slightly more complex.

Regarding files, main function can load modules (beam or AVM packages) using FetchAPI, which means they can be served by the same HTTP server. This is a fallback and users can preload files using Emscripten file_packager tool.

The port also uses Emscripten’s proxy-to-pthread feature which means AtomVM’s main function is run in a web worker. The rationale is the browser thread (or main thread) with WebAssembly cannot run a loop such as AtomVM’s schedulers. Web workers typically cannot manipulate the DOM and do other things that only the browser’s main thread can do. For this purpose, Erlang processes can call emscripten:run_script/2 function which dispatches the Javascript to execute to the main thread, waiting for completion (with [main_thread]) or not waiting for completion (with [main_thread, async]). Waiting for completion of a script on the main thread does not block the Erlang scheduler, other Erlang processes can be scheduled. Execution of Javascript on the worker thread, however, does block the scheduler.

Javascript code can also send messages to Erlang processes using call and cast functions from main.c. These functions are actually wrapped in atomvm.pre.js. Usage is demonstrated by call_cast.html example.

Cast is straightforward: the message is enqueued and picked up by the scheduler. It is freed when it is processed.

Call allows Javascript code to wait for the result and is based on Javascript promises (related to async/await syntax).

  • A promise is created (in the browser’s main thread) in a map to prevent Javascript garbage collection (this is done by Emscripten’s promise glue code).

  • An Erlang resource is created to encapsulate the promise so it is properly destroyed when garbage collected

  • A message is enqueued with the resource as well as the registered name of the target process and the content of the message

  • C code returns the handle of the promise (actually the index in the map) to Javascript Module.call wrapper.

  • The Module.call wrapper converts the handle into a Promise object and returns it, so Javascript code can await on the promise.

  • A scheduler dequeues the message with the resource, looks up the target process and sends it the resource as a term

  • The target process eventually calls emscripten:promise_resolve/1,2 or emscripten:promise_reject/1,2 to resolve or reject the promise.

  • The emscripten:promise_resolve/1,2 and emscripten:promise_reject/1,2 nifs dispatch a message in the browser’s main thread.

  • The dispatched function retrieves the promise from its index, resolves or rejects it, with the value passed to emscripten:promise_resolve/2 or emscripten:promise_reject/2 and destroys it.

Values currently can only be integers or strings.

If the scheduler cannot find the target process, the promise is rejected with “noproc” as a value. As the promise is encapsulated into an Erlang resource, if the resource object’s reference count reaches 0, the promise is rejected with “noproc” as the value.