Elixir is an extensible programming language, which means that, by using metaprogramming, we can enhance the way it works. The tool that plays a pivotal role in this is the Elixir parser, which creates an AST from source code files. This post explains what the AST is and how it can be explored more easily.

💡 This article is based on my talk “The Elixir Parser under the Microscope”, that I gave at Elixirconf EU, on April 9th, 2019.

What is the AST?

An AST, acronym for Abstact Syntax Tree, is a representation of code as a data structure. This data structure is used by Elixir to run Elixir code: either by interpreting it directly: executing the commands in the AST by recursively walking it; or by compiling it: translating the AST into another format, namely BEAM bytecode instructions, which then are saved to disk as .beam files. On runtime, these .beam files are then loaded and executed more efficiently than the interpreter can.

The following picture shows how the Elixir compiler uses the AST.

Elixir using the AST

You can think of the AST as a structured version of your source code. And what’s so nice about Elixir’s exensiblity: You can even write computer programs to modify this code! These are called macros. So when Elixir takes your source code file, it transforms it to an AST, and then it runs macros on them. And only after that it gets compiled. This macro expansion stage is called the metaprogramming aspect about it.

Where in other languages you are writing files back to disk, whenever you need to generate code (e.g. wrapper classes for an ORM); with Elixir, this is never really needed, as you can do everything with macros. If you want to know more about metaprogramming, I would really advise you to read the book Metaprogramming Elixir.

What does the AST look like?

With Elixir’s iex, it is straightforward to inspect the AST. The quote function takes regular Elixir code and returns the AST of it. If you want to convert a string into an AST, use Code.string_to_quoted/2.

By using quote() in iex, we can see what the resulting AST looks like.

Most basic types stay like they are: integers, atoms, strings, lists, pids, etc.:

iex> quote do: :hello
iex> quote do: [1, 2, "a string", false, nil]
[1, 2, "a string", false, nil]

However, tuples (and maps as well) are represented differently:

iex> quote do: {1, 2, "hello"}
{:{}, [], [1, 2, "hello"]}

This is because the AST uses a tuple to represent the basic building building block, the AST node. An AST node always has the following format:

{marker, metadata, children}

The marker is some kind of atom, the metadata is a keyword list which contains annotations about the metadata node (line number, file, optional column number, et cetera). The children is either a list of child AST nodes or it’s an atom. These children are what makes the AST tree-shaped.

So for instance the expression 2 * x looks like this:

{:*, [], [2, {:x, [], nil}]}

Where that inner {:x, [], nil} is the variable reference. Notice that operators (in this case *), are just regular function calls in Elixir.

As said, the AST is just a data structure, that you can even use for something completely different that Elixir code. At Botsquad, we have used the AST to create Bubblescript: our conversational scripting language that looks like Elixir (and uses its parser!) but has very different execution semantics.

The AST Ninja to the rescue!

The AST is well documented in Elixir’s documentation, but still, if you are looking at a large “raw” AST structure, it can get overwhelming pretty quickly.

To demonstrate, if you would have a function call inside an if block, the AST would look something like this:

iex> quote do
...>   if moon() == :full do
...>     IO.puts("Full moon")
...>   end
...> end
{:if, [context: Elixir, import: Kernel],
   {:==, [context: Elixir, import: Kernel], [{:moon, [], []}, :full]},
     do: {{:., [], [{:__aliases__, [alias: false], [:IO]}, :puts]}, [],
      ["Full moon"]}

When I was investigating Elixir’s parser as research for my talk, I needed very regularly to call Elixir’s functions which relate to parsing and macros: quote(), Code.string_to_quoted(), Macro.to_string(), et cetera.

I wanted a way to explore these function calls interactively, in a way that their behaviour could be easily understood and quickly experimented with. That’s why I created a web-based AST exploration tool. First I called it the “AST Explorer”, but when I was looking for a domain, it suggested the .ninja TLD, so I renamed the project to AST Ninja.

AST ninja screenshot

đź’ˇ Visit the AST ninja here: https://ast.ninja/

The left side of the screen allows you to enter any Elixir code, or even any code that the parser can parse. If you just randomly try text, you will be amazed at what the parser still accepts.

The right side initially shows the AST panel, which contains the parsed AST, indented and colorized. The parsing is done in real time, so as soon as you type, it tries to parse it. That alone already saves a lot of keystrokes :-)

The “Add…” button in the top right allows you to add more panels to the UI. Each of those panels contain a different kind of parser or another way to display the code:

  • AST Uses Code.string_to_quoted/2 to parse and pretty-print the AST. Exposes a few options that influence the way the parser works (formatter metadata, atom creation allowed).

  • Tokenizer Uses Elixir’s internal :elixir_tokenizer.tokenize/2 function to generate the token stream, which is the first stage of to getting to the AST.

  • AST to String Demonstrates that an AST can be converted back to a string. In my talk I used this to show that Macro.to_string/1 currently can not be used to create a refactoring tool, due to that the parser discards information about whitespace and code comments.

  • Code.Formatter.to_algebra/2 Shows how the Elixir formatter creates an algebra document. JosĂ© Valim explained about the formatter in his talk at Elixirconf EU 2018.

…and then there are a few more panels containing a few experiments; for instance, as an experiment I created an AST-to-json converter, for the presentation I used leex and yecc to create a simple expression language.

After the talk, somebody came up to me and asked if I would like to add some more parsers, for instance, the Erlang parser and tokenizer. And I’m considering that, of course. PR’s welcome :-)

You can try the AST ninja for yourself, at https://ast.ninja/. Hopefully you will find it useful, and feedback is very welcome. You can find the source code on github, or just shoot me an email.

Arjan Scherpenisse

Arjan Scherpenisse

Co-founder Botsquad

email me