Jekyll2021-12-11T19:43:30+00:00http://keathley.github.io/feed.xmlkeathley.github.ioThis is where I think things. Sometimes it's about code. Sometimes that code works.Chris KeathleyTesting your README.md2021-12-11T11:00:00+00:002021-12-11T11:00:00+00:00http://keathley.github.io/blog/testing-readmes<p>I’ve wanted a good way to test READMEs in elixir. I started building something myself
before my good friend <a href="https://twitter.com/wojtekmach">Wojtek</a> pointed out that there was a simple solution.</p>
<p>I like to use the README or at least a section of the README as the module doc
for the main module in my libraries. Typically that looks something like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Vapor</span> <span class="k">do</span>
<span class="nv">@moduledoc</span> <span class="s2">"README.md"</span>
<span class="o">|></span> <span class="no">File</span><span class="o">.</span><span class="n">read!</span><span class="p">()</span>
<span class="o">|></span> <span class="no">String</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"<!-- MDOC !-->"</span><span class="p">)</span>
<span class="o">|></span> <span class="no">Enum</span><span class="o">.</span><span class="n">fetch!</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>I use <code class="language-plaintext highlighter-rouge"><!-- MDOC !--></code> to mark the section of the README that I want to use as module doc. This way I can remove the installation instructions or other material that doesn’t serve any purpose in the module doc.</p>
<p>The trick that Wojtek clued me into, is that if you write your examples using the doctest syntax, then these can be automatically doctested.</p>
<p>I’ve slowly started adding this to all new modules that I write. Its an easy way to ensure that your README doesn’t get out of date with reality. If you want to look at a complete example of this, you can check out how I’ve structured <a href="https://github.com/elixir-toniq/norm">Norm</a>.</p>Chris KeathleyI’ve wanted a good way to test READMEs in elixir. I started building something myself before my good friend Wojtek pointed out that there was a simple solution.Good and Bad Elixir2021-06-02T10:00:00+00:002021-06-02T10:00:00+00:00http://keathley.github.io/blog/good-and-bad-elixir<p>I’ve seen a lot of elixir at this point, both good and bad. Through all of that code, I’ve seen similar patterns that tend to lead to worse code. So I thought I would document some of them as well as better alternatives to these patterns.</p>
<h2 id="mapget2-and-keywordget2-vs-access">Map.get/2 and Keyword.get/2 vs. Access</h2>
<p><code class="language-plaintext highlighter-rouge">Map.get/2</code> and <code class="language-plaintext highlighter-rouge">Keyword.get/2</code> lock you into using a specific data structure. This means that if you want to change the type of structure, you now need to update all of the call sites. Instead of these functions, you should prefer using Access:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">%{</span><span class="ss">foo:</span> <span class="ss">:bar</span><span class="p">}</span>
<span class="no">Map</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="c1"># Do</span>
<span class="n">opts</span><span class="p">[</span><span class="ss">:foo</span><span class="p">]</span>
</code></pre></div></div>
<h2 id="dont-pipe-results-into-the-following-function">Don’t pipe results into the following function</h2>
<p>Side-effecting functions tend to return “results” like <code class="language-plaintext highlighter-rouge">{:ok, term()} | {:error, term()}</code>. If your dealing with side-effecting functions, don’t pipe the results into the next function. It’s always better to deal with the results directly using either <code class="language-plaintext highlighter-rouge">case</code> or <code class="language-plaintext highlighter-rouge">with</code>.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="n">data</span>
<span class="o">|></span> <span class="n">call_service</span>
<span class="o">|></span> <span class="n">parse_response</span>
<span class="o">|></span> <span class="n">handle_result</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">call_service</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># ...</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">parse_response</span><span class="p">({</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">},</span> <span class="k">do</span><span class="p">:</span> <span class="no">Jason</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="k">defp</span> <span class="n">parse_response</span><span class="p">(</span><span class="n">error</span><span class="p">,</span> <span class="k">do</span><span class="p">:</span> <span class="n">error</span><span class="p">)</span>
<span class="k">defp</span> <span class="n">handle_result</span><span class="p">({</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">decoded</span><span class="p">}),</span> <span class="k">do</span><span class="p">:</span> <span class="n">decoded</span>
<span class="k">defp</span> <span class="n">handle_result</span><span class="p">({</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}),</span> <span class="k">do</span><span class="p">:</span> <span class="k">raise</span> <span class="n">error</span>
<span class="c1"># Do...</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="n">with</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">response</span><span class="p">}</span> <span class="o"><-</span> <span class="n">call_service</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">decoded</span><span class="p">}</span> <span class="o"><-</span> <span class="n">parse_response</span><span class="p">(</span><span class="n">response</span><span class="p">)</span> <span class="k">do</span>
<span class="n">decoded</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Using pipes forces our functions to handle the previous function’s results, spreading error handling throughout our various function calls. The core problem here is subtle, but it’s essential to internalize. Each of these functions has to know too much information about how it is being called. Good software design is mainly about building reusable bits that can be arbitrarily composed. In the pipeline example, the functions know how they’re used, how they’re called, and what order they’re composed.</p>
<p>Another problem with the pipeline approach is that it tends to assume that errors can be handled generically. This assumption is often incorrect.</p>
<p>When dealing with side-effects, the only function with enough information to decide what to do with an error is the calling function. In many systems, the error cases are just as important - if not more important - than the “happy path” case. The error cases are where you’re going to have to perform fallbacks or graceful degradation.</p>
<p>If your in a situation where errors are a vital part of your functions control flow, then it’s best to keep all of the error handling in the calling function using <code class="language-plaintext highlighter-rouge">case</code> statements.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Do...</span>
<span class="k">def</span> <span class="n">main</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">do</span>
<span class="k">case</span> <span class="ss">:fuse</span><span class="o">.</span><span class="n">check</span><span class="p">(</span><span class="ss">:service</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:ok</span> <span class="o">-></span>
<span class="k">case</span> <span class="n">call_service</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">}</span> <span class="o">-></span>
<span class="ss">:ok</span> <span class="o">=</span> <span class="no">Cache</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">result</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">}</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}</span> <span class="o">-></span>
<span class="ss">:fuse</span><span class="o">.</span><span class="n">melt</span><span class="p">(</span><span class="ss">:service</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}</span>
<span class="k">end</span>
<span class="ss">:blown</span> <span class="o">-></span>
<span class="n">cached</span> <span class="o">=</span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="k">if</span> <span class="n">cached</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This increases the size of the calling function, but the benefit is that you can read this entire function and understand each control flow.</p>
<h2 id="dont-pipe-into-case-statements">Don’t pipe into case statements</h2>
<p>I used to be on the fence about piping into case statements, but I’ve seen this pattern abused too many times. Seriously y’all, put down the pipe operator and show a little restraint. If you find yourself piping into <code class="language-plaintext highlighter-rouge">case</code>, it’s almost always better to assign intermediate steps to a variable instead.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="n">build_post</span><span class="p">(</span><span class="n">attrs</span><span class="p">)</span>
<span class="o">|></span> <span class="n">store_post</span><span class="p">()</span>
<span class="o">|></span> <span class="k">case</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">post</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># ...</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">_</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># ...</span>
<span class="k">end</span>
<span class="c1"># Do...</span>
<span class="n">changeset</span> <span class="o">=</span> <span class="n">build_post</span><span class="p">(</span><span class="n">attrs</span><span class="p">)</span>
<span class="k">case</span> <span class="n">store_post</span><span class="p">(</span><span class="n">changeset</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">post</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># ...</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">_</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="dont-hide-higher-order-functions">Don’t hide higher-order functions</h2>
<p>Higher-order functions are great, so try not to hide them away. If you’re working with collections, you should prefer to write functions that operate on a single entity rather than the collection itself. Then you can use higher-order functions directly in your pipeline.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="n">collection</span>
<span class="o">|></span> <span class="n">parse_items</span>
<span class="o">|></span> <span class="n">add_items</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">parse_items</span><span class="p">(</span><span class="n">list</span><span class="p">)</span> <span class="k">do</span>
<span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">list</span><span class="p">,</span> <span class="o">&</span><span class="no">String</span><span class="o">.</span><span class="n">to_integer</span><span class="o">/</span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">add_items</span><span class="p">(</span><span class="n">list</span><span class="p">)</span> <span class="k">do</span>
<span class="no">Enum</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="n">list</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span> <span class="nv">&1</span> <span class="o">+</span> <span class="nv">&2</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1"># Do...</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="n">collection</span>
<span class="o">|></span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&</span><span class="n">parse_item</span><span class="o">/</span><span class="mi">1</span><span class="p">)</span>
<span class="o">|></span> <span class="no">Enum</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">add_item</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">parse_item</span><span class="p">(</span><span class="n">item</span><span class="p">),</span> <span class="k">do</span><span class="p">:</span> <span class="no">String</span><span class="o">.</span><span class="n">to_integer</span><span class="p">(</span><span class="n">item</span><span class="p">)</span>
<span class="k">defp</span> <span class="n">add_item</span><span class="p">(</span><span class="n">num</span><span class="p">,</span> <span class="n">acc</span><span class="p">),</span> <span class="k">do</span><span class="p">:</span> <span class="n">num</span> <span class="o">+</span> <span class="n">acc</span>
</code></pre></div></div>
<p>With this change, our <code class="language-plaintext highlighter-rouge">parse_item</code> and <code class="language-plaintext highlighter-rouge">add_item</code> functions become reusable in the broader set of contexts. These functions can now be used on a single item or can be lifted into the context of <code class="language-plaintext highlighter-rouge">Stream</code>, <code class="language-plaintext highlighter-rouge">Enum</code>, <code class="language-plaintext highlighter-rouge">Task</code>, or any number of other uses. Hiding this logic away from the caller is a worse design because it couples the function to its call site and makes it less reusable. Ideally, our APIs are reusable in a wide range of contexts.</p>
<p>Another benefit of this change is that better solutions may reveal themselves. In this case, we may decide that we don’t need the named functions and can use anonymous functions instead. We realize that we don’t need the <code class="language-plaintext highlighter-rouge">reduce</code> and can use <code class="language-plaintext highlighter-rouge">sum</code>.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="n">collection</span>
<span class="o">|></span> <span class="no">Enum</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&</span><span class="no">String</span><span class="o">.</span><span class="n">to_integer</span><span class="o">/</span><span class="mi">1</span><span class="p">)</span>
<span class="o">|></span> <span class="no">Enum</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This final step may not always be the right choice. It depends on how much work your functions are doing. But, as a general rule, you should strive to eliminate functions that only have a single call site. Even though there are no dedicated names for these functions, the final version is no less “readable” than when we started. An Elixir programmer can still look at this series of steps and understand that the goal is to convert a collection of strings into integers and then sum those integers. And, they can realize this without needing to read any other functions along the way.</p>
<h2 id="avoid-else-in-with-blocks">Avoid <code class="language-plaintext highlighter-rouge">else</code> in <code class="language-plaintext highlighter-rouge">with</code> blocks</h2>
<p><code class="language-plaintext highlighter-rouge">else</code> can be helpful if you need to perform an operation that is generic across <em>all</em> error values being returned. You should not use <code class="language-plaintext highlighter-rouge">else</code> to handle all potential errors (or even a large number of errors).</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="n">with</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">response</span><span class="p">}</span> <span class="o"><-</span> <span class="n">call_service</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">decoded</span><span class="p">}</span> <span class="o"><-</span> <span class="no">Jason</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">response</span><span class="p">),</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">}</span> <span class="o"><-</span> <span class="n">store_in_db</span><span class="p">(</span><span class="n">decoded</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:ok</span>
<span class="k">else</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="p">%</span><span class="no">Jason</span><span class="o">.</span><span class="no">Error</span><span class="p">{}</span><span class="o">=</span><span class="n">error</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># Do something with json error</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="p">%</span><span class="no">ServiceError</span><span class="p">{}</span><span class="o">=</span><span class="n">error</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># Do something with service error</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="p">%</span><span class="no">DBError</span><span class="p">{}}</span> <span class="o">-></span>
<span class="c1"># Do something with db error</span>
<span class="k">end</span>
</code></pre></div></div>
<p>For the same reason, under no circumstances should you annotate your function calls with a name just so you can differentiate between them.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">with</span> <span class="p">{</span><span class="ss">:service</span><span class="p">,</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">resp</span><span class="p">}}</span> <span class="o"><-</span> <span class="p">{</span><span class="ss">:service</span><span class="p">,</span> <span class="n">call_service</span><span class="p">(</span><span class="n">data</span><span class="p">)},</span>
<span class="p">{</span><span class="ss">:decode</span><span class="p">,</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">decoded</span><span class="p">}}</span> <span class="o"><-</span> <span class="p">{</span><span class="ss">:decode</span><span class="p">,</span> <span class="no">Jason</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">resp</span><span class="p">)},</span>
<span class="p">{</span><span class="ss">:db</span><span class="p">,</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">}}</span> <span class="o"><-</span> <span class="p">{</span><span class="ss">:db</span><span class="p">,</span> <span class="n">store_in_db</span><span class="p">(</span><span class="n">decoded</span><span class="p">)}</span> <span class="k">do</span>
<span class="ss">:ok</span>
<span class="k">else</span>
<span class="p">{</span><span class="ss">:service</span><span class="p">,</span> <span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}}</span> <span class="o">-></span>
<span class="c1"># Do something with service error</span>
<span class="p">{</span><span class="ss">:decode</span><span class="p">,</span> <span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}}</span> <span class="o">-></span>
<span class="c1"># Do something with json error</span>
<span class="p">{</span><span class="ss">:db</span><span class="p">,</span> <span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">error</span><span class="p">}}</span> <span class="o">-></span>
<span class="c1"># Do something with db error</span>
<span class="k">end</span>
</code></pre></div></div>
<p>If you find yourself doing this, it means that the error conditions matter. Which means that you don’t want <code class="language-plaintext highlighter-rouge">with</code> at all. You want <code class="language-plaintext highlighter-rouge">case</code>.</p>
<p><code class="language-plaintext highlighter-rouge">with</code> is best used when you can fall through at any point without worrying about the specific error or contrary pattern. A good way to create a more unified way to deal with errors is to build a common error type like so:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Error</span> <span class="k">do</span>
<span class="k">defexception</span> <span class="p">[</span><span class="ss">:code</span><span class="p">,</span> <span class="ss">:msg</span><span class="p">,</span> <span class="ss">:meta</span><span class="p">]</span>
<span class="k">def</span> <span class="n">new</span><span class="p">(</span><span class="n">code</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span> <span class="ow">when</span> <span class="n">is_binary</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%</span><span class="bp">__MODULE__</span><span class="p">{</span><span class="ss">code:</span> <span class="n">code</span><span class="p">,</span> <span class="ss">msg:</span> <span class="n">msg</span><span class="p">,</span> <span class="ss">meta:</span> <span class="no">Map</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">meta</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">not_found</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">meta</span> <span class="p">\\</span> <span class="p">%{})</span> <span class="k">do</span>
<span class="n">new</span><span class="p">(</span><span class="ss">:not_found</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">internal</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">meta</span> <span class="p">\\</span> <span class="p">%{})</span> <span class="k">do</span>
<span class="n">new</span><span class="p">(</span><span class="ss">:internal</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="n">with</span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">response</span><span class="p">}</span> <span class="o"><-</span> <span class="n">call_service</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">decoded</span><span class="p">}</span> <span class="o"><-</span> <span class="n">decode</span><span class="p">(</span><span class="n">response</span><span class="p">),</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">result</span><span class="p">}</span> <span class="o"><-</span> <span class="n">store_in_db</span><span class="p">(</span><span class="n">decoded</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:ok</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># We wrap the result of Jason.decode in our own custom error type</span>
<span class="k">defp</span> <span class="n">decode</span><span class="p">(</span><span class="n">resp</span><span class="p">)</span> <span class="k">do</span>
<span class="n">with</span> <span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">e</span><span class="p">}</span> <span class="o"><-</span> <span class="no">Jason</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">resp</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="no">Error</span><span class="o">.</span><span class="n">internal</span><span class="p">(</span><span class="s2">"could not decode: </span><span class="si">#{</span><span class="n">inspect</span> <span class="n">resp</span><span class="si">}</span><span class="s2">"</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This error struct provides a unified way to surface all of the errors in your application. The struct can render errors in a phoenix controller or be returned from an RPC handler. Because the struct your using is an exception, the caller can also choose to raise the error, and you’ll get well-formatted error messages.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">case</span> <span class="n">main</span><span class="p">()</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">_</span><span class="p">}</span> <span class="o">-></span> <span class="ss">:ok</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">e</span><span class="p">}</span> <span class="o">-></span> <span class="k">raise</span> <span class="n">e</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="state-what-you-want-not-what-you-dont">State what you want, not what you don’t</h2>
<p>You should be intentional about your function’s requirements. Don’t bother checking that a value is not <code class="language-plaintext highlighter-rouge">nil</code> if what you expect it to be is a string:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="k">def</span> <span class="n">call_service</span><span class="p">(%{</span><span class="ss">req:</span> <span class="n">req</span><span class="p">})</span> <span class="ow">when</span> <span class="ow">not</span> <span class="n">is_nil</span><span class="p">(</span><span class="n">req</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># ...</span>
<span class="k">end</span>
<span class="c1"># Do...</span>
<span class="k">def</span> <span class="n">call_service</span><span class="p">(%{</span><span class="ss">req:</span> <span class="n">req</span><span class="p">})</span> <span class="ow">when</span> <span class="n">is_binary</span><span class="p">(</span><span class="n">req</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The same is true for <code class="language-plaintext highlighter-rouge">case</code> statements and <code class="language-plaintext highlighter-rouge">if</code> statements. Be more explicit about what it is you expect. You’d prefer to raise or crash if you receive arguments that would violate your expectations.</p>
<h2 id="only-return-error-tuples-when-the-caller-can-do-something-about-it">Only return error tuples when the caller can do something about it.</h2>
<p>You should only force your user to deal with errors that they can do something about. If your API can error, and there’s nothing the caller can do about it, then raise an exception or throw. Don’t bother making your callers deal with result tuples when there’s nothing they can do.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="k">def</span> <span class="n">get</span><span class="p">(</span><span class="n">table</span> <span class="p">\\</span> <span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">id</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># If the table doesn't exist ets will throw an error. Catch that and return</span>
<span class="c1"># an error tuple</span>
<span class="k">try</span> <span class="k">do</span>
<span class="ss">:ets</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">id</span><span class="p">)</span>
<span class="k">catch</span>
<span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">-></span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="s2">"Table is not available"</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># Do...</span>
<span class="k">def</span> <span class="n">get</span><span class="p">(</span><span class="n">table</span> <span class="p">\\</span> <span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">id</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># If the table doesn't exist, there's nothing the caller can do</span>
<span class="c1"># about it, so just throw.</span>
<span class="ss">:ets</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">id</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="raise-exceptions-if-you-receive-invalid-data">Raise exceptions if you receive invalid data.</h2>
<p>You should not be afraid of just raising exceptions if a return value or piece of data has violated your expectations.
If you’re calling a downstream service that should always return JSON, use <code class="language-plaintext highlighter-rouge">Jason.decode!</code> and avoid writing additional error handling logic.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">resp</span><span class="p">}</span> <span class="o">=</span> <span class="n">call_service</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="k">case</span> <span class="no">Jason</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">resp</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">decoded</span><span class="p">}</span> <span class="o">-></span>
<span class="n">decoded</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="n">e</span><span class="p">}</span> <span class="o">-></span>
<span class="c1"># Now what?...</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># Do...</span>
<span class="k">def</span> <span class="n">main</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">resp</span><span class="p">}</span> <span class="o">=</span> <span class="n">call_service</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="n">decoded</span> <span class="o">=</span> <span class="no">Jason</span><span class="o">.</span><span class="n">decode!</span><span class="p">(</span><span class="n">resp</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This allows us to crash the process (which is good) and removes the useless error handling logic from the function.</p>
<h2 id="use-for-when-checking-collections-in-tests">Use <code class="language-plaintext highlighter-rouge">for</code> when checking collections in tests</h2>
<p>This is a quick one, but it makes your test failures much more helpful.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Don't do...</span>
<span class="n">assert</span> <span class="no">Enum</span><span class="o">.</span><span class="n">all?</span><span class="p">(</span><span class="n">posts</span><span class="p">,</span> <span class="k">fn</span> <span class="n">post</span> <span class="o">-></span> <span class="p">%</span><span class="no">Post</span><span class="p">{}</span> <span class="o">==</span> <span class="n">post</span> <span class="k">end</span><span class="p">)</span>
<span class="c1"># Do...</span>
<span class="n">for</span> <span class="n">post</span> <span class="o"><-</span> <span class="n">posts</span><span class="p">,</span> <span class="k">do</span><span class="p">:</span> <span class="n">assert</span> <span class="p">%</span><span class="no">Post</span><span class="p">{}</span> <span class="o">==</span> <span class="n">post</span>
</code></pre></div></div>Chris KeathleyI’ve seen a lot of elixir at this point, both good and bad. Through all of that code, I’ve seen similar patterns that tend to lead to worse code. So I thought I would document some of them as well as better alternatives to these patterns.Using Regulator2021-02-26T13:52:00+00:002021-02-26T13:52:00+00:00http://keathley.github.io/blog/regulator<blockquote>
<p>I originally wrote this for the backend engineers at Bleacher Report. I thought
that it might be useful to others to repost it here. I’ve obfuscated the names
of the specific services but otherwise left it as is.</p>
</blockquote>
<p><a href="https://github.com/keathley/regulator">Regulator</a> is our service’s first line of defense.</p>
<p>It protects each service by dynamically limiting the number of concurrent <em>things</em> that can take place at any given time. I’ll try to provide some context on the problem that we’re trying to solve, a little bit of queueing theory, what Regulator is doing under the hood, and how to configure Regulator itself to provide the most benefit.</p>
<h2 id="dammit-keathley-just-tell-me-how-to-configure-your-stupid-library">Dammit Keathley, just tell me how to configure your stupid library</h2>
<p>Ok. TL;DR - If you’re configuring a “client” (someone making calls to a service) use the AIMD regulator like so:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Regulator</span><span class="o">.</span><span class="n">install</span><span class="p">(</span><span class="ss">:downstream_service</span><span class="p">,</span> <span class="p">{</span><span class="no">Regulator</span><span class="o">.</span><span class="no">Limit</span><span class="o">.</span><span class="no">AIMD</span><span class="p">,</span> <span class="p">[</span><span class="ss">timeout:</span> <span class="mi">15</span><span class="p">]})</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">:timeout</code> value should be set to the maximum, average latency you expect from the service. If the average latency drifts above this value, Regulator will treat it as an error and begin to limit the number of requests your service can make to the downstream service.</p>
<p>If you’re adding a “service” regulator (a service protecting itself from overload) you should use the Gradient regulator with the default values:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Regulator</span><span class="o">.</span><span class="n">install</span><span class="p">(</span><span class="ss">:myself</span><span class="p">,</span> <span class="p">{</span><span class="no">Regulator</span><span class="o">.</span><span class="no">Limit</span><span class="o">.</span><span class="no">Gradient</span><span class="p">,</span> <span class="p">[]})</span>
</code></pre></div></div>
<p>If you want to learn about <em>why</em> this works, please read on.</p>
<h2 id="why-do-you-keep-saying-queueing-theory-in-slack-all-the-time">Why do you keep saying ‘queueing theory’ in slack all the time?</h2>
<p>You know how people say “it’s turtles all the way down” and you never understood why or what that even means?</p>
<p>Same.</p>
<p>Anyway, that example is only relevant because it’s also true about queues and queueing theory. Computer systems, turns out, almost always boil down to queues, and understanding them is one of the best uses of time if you want to make “performance” part of your career.</p>
<p>The good news: “queueing theory” is really just applied probability and statistics.
The bad news: no one remembers probability and statistics.</p>
<p>Which makes sense. Who would have thought that a junior year, blow-off math class was going to be the fucking cornerstone of performance optimizations for working-class software wranglers?</p>
<p>The really good news: The math isn’t that bad. In fact, most of the time you don’t need math as much as you need an <em>intuition</em> about the math. When
you need actual math, it’s typically not worse than algebra. Let’s look at an example to see how this works out.</p>
<p>Let’s imagine that Zeus receives 1 request per second. In that one request, Zeus needs to call Athena and Athena takes 100ms on average, to process the request. In technical terms, we’d say that Zeus’s <em>Arrival Rate</em> is 1 RPS and its <em>work time</em> is 0.1 S. Because Athena’s work time is so much lower than the rate of arrival, Zeus will only ever be processing 1 request at a time. In queueing terms, we’d say that the number of items in the queue is 0.</p>
<p>But, let’s see what happens if the arrival rate increases to 10 RPS. We don’t know <em>how</em> these requests arrive yet - all at once, one at a time, spread out over a Poisson distribution (spoilers: it’s this one) - but we’re not worried about that.</p>
<p>Question: On <em>average</em>, given an arrival rate of 10 RPS and a work time of 0.1 seconds, how many requests are waiting in Zeus waiting to be fulfilled over a 1-second window?</p>
<p>Think about it for a bit…</p>
<p>Ok, if you answered “1 request” you have an intuition for what’s going on here. If you got something else, that’s fine. I’m about to show you the math that allows you to calculate this yourself.</p>
<p>In math terms, we can say that the average items in a queue are equal to the average arrival rate multiplied by the average work time:</p>
<p><code class="language-plaintext highlighter-rouge">avg. items in queue = avg. arrival rate * avg. work time</code>.</p>
<p>You’ll typically see this equation written out with fancy-pants math terms:</p>
<p><code class="language-plaintext highlighter-rouge">L = λ x W</code></p>
<p>It turns out that this little equation (giggle) is actually quite famous. It’s
known as Little’s Law and is one of the most ubiquitous pieces of mathematics. Little’s Law is important to internalize because Little’s law
works for <em>all</em> queueing systems, including the queues inside your queues.</p>
<p>In our example above, there are really 3 queues that we care about (and there are more internally that we’re ignoring). There’s Zeus internal queue, Athena’s internal queue, and the queue that encapsulates both of them that the client experiences.</p>
<p>Remember when I said that its queue’s all the way down? This is what I meant.</p>
<p>Now that we can understand a bit more about queues, we can talk about overload and what causes systems to fail.</p>
<h2 id="overload">Overload</h2>
<p>When one of our services fails, it’s almost always due to a condition called “overload”. We tend to use the term overload for a wide class of errors, but it actually has a technical definition. A queue is considered “overloaded” when the queue’s arrival rate is higher than its departure rate. If your queue is accepting things faster than it can process them, you have overload, and you’re going to have a bad time.</p>
<p>Now that we understand Little’s Law we can talk more formally about what happens when a queue becomes overloaded, and we can see why when Athena gets slow, Zeus’s CPU starts to spike.</p>
<p>Back to our initial example, if Zeus’s arrival rate is 10 RPS and Athena’s work time is 100ms, that means that we have 1 request waiting in Zeus to be processed. What happens if Athena’s work time increases to 200ms?</p>
<p>To understand this we need to first answer the question, “What is Athena’s
arrival rate?”. What’s your intuition?</p>
<p>If you said, “it’s equal to the work time” then you are correct. In our example, Zeus functions as a serial queue. It can only send requests to Athena as fast as Athena can process them. So, if Athena’s work time increases to 200ms where does the queueing occur? Zeus of course. But, Zeus’s arrival rate is based on client demand. So it’s still receiving 10 RPS. But Zeus’s effective work time has increased to 200ms. So, we bust out Little’s Law real quick and figure out how many
requests are queueing in Zeus and get 2. That’s not that bad.</p>
<p>But, what happens if Zeus’s arrival rate jumps to 100 RPS? Well, Athena’s work time is going to stay fixed at 200ms. This means we have 100 * .2 which gives us 20 requests.</p>
<p>So, Zeus is now going to have 20 requests piling up and waiting to be fulfilled and that pain isn’t going to be felt by Athena directly.</p>
<p>This is another important intuition to develop. A downstream system may never detect that queueing is happening upstream. Athena’s database might get a little backed up and Athena won’t consider that to be an error. But that small hiccup results in increased CPU pressure in Zeus because now Zeus has requests piling up. Coincidentally, this
is why rate-limiting doesn’t prevent overload.</p>
<p>What happens to Zeus in this scenario? Well, on a long enough time frame, it’ll collapse. Once a queue has been built up in this way, it’s very hard to eliminate that queue, even if Zeus’s arrival rate goes back to 10 RPS. In fact, to eliminate the queue in Zeus, Athena’s work time would need to drop <em>below</em> its original 100ms. While queued, those requests are holding open expensive resources like HTTP connections, using memory, and adding additional pressure to the BEAM schedulers. When situations like this occur, you have exactly 2 options:</p>
<ul>
<li>Do nothing and let the entire system crash.</li>
<li>Drop your work time as low as possible, preferably to 0.</li>
</ul>
<p>Ideally, we should be choosing the second option. Also, not explicitly choosing the second option is implicitly choosing the first option. Not making a choice is the same as choosing the first option.</p>
<p>Dropping work like this is referred to as “load shedding”. Which really means dropping the work time so low that it avoids overload. These days, our tool of choice for load shedding is Regulator.</p>
<h2 id="a-quick-aside-on-why-autoscaling-is-not-a-solution">A quick aside on why autoscaling is not a solution</h2>
<p>Typically this is around the time that people start claiming that these problems go away if you just use The Cloud and The Magic of Autoscaling.</p>
<p>Firstly, if anyone ever uses words like “you should just” or “why can’t you just”
it typically indicates that the person in question doesn’t really understand the
problem domain.</p>
<p>Secondly, autoscaling is not a solution to these problems because the premise
of that solution is predicated on the idea that the bottleneck in question can
be infinitely scaled horizontally. Which isn’t true. It does tend to be the case
that in single threaded runtimes such as ruby and python, adding more application
instances is a winning strategy since it extends the largest bottleneck; the
application runtime that is running all of your most complex code.</p>
<p>But lets say that the latency is increasing in a database. As we’ve already seen,
this increased latency translates directly into queueing in application code. That queueing
will tend to cause increased sojurn time (the time it takes to get through the queue),
increased CPU, and increased memory, all of which are typically used as signals to scale
up the application servers. Adding more application servers typically results in
<em>even more pressure on the database</em>. Meaning, you just made your problem <strong>worse</strong>.</p>
<p>At this point, the naysayers will make an argument for using dynamo claiming its “infinitely
scalable” or some such. My response to them: No it isn’t. Any of these things can fail.
Your network might introduce lag. Maybe your client disconnects from your dynamo table
and is continually trying to reconnect causing timeouts. Maybe you’re misconfigured.
Maybe someone at AWS symlinked the wrong directory again.</p>
<p>We’re talking about building resilient systems here. You can’t count on
a vendor to provide that resilience.</p>
<p>Autoscaling is a cost saving measure only. Always keep that in mind and you’ll
be better off.</p>
<h2 id="how-does-regulator-do">How does Regulator do?</h2>
<p>Regulator is based on the same techniques that power the internet’s networks. These techniques are often referred to as “adaptive concurrency” or “adaptive capacity” in the literature. It’s “adaptive” because the system regularly makes changes based on observed values.</p>
<p>Regulator only allows a certain number of <em>things</em> to occur in a system concurrently. For instance, Regulator may have determined that
Athena can handle 5 requests concurrently. If a 6th request is made before any of the original 5 have been completed, the 6th request will be rejected.</p>
<p>Periodically, Regulator examines the requests its seen and decides whether or not it should allow more or fewer things to happen concurrently. Typically, Regulator is examining signals such as changes in latency, whether an error occurred, how many requests were in-flight at the same time, etc. It does all of this to avoid any requests queueing in the system.</p>
<p>So, in our initial example, if Zeus used a regulator around its calls to Athena, the regulator would see the increase in work time and arrival rate and would begin to reject excess requests to avoid queueing. Because our work time has now dropped by so many orders of magnitude, we can return a 500, or we can return a cached value from memory. Either of these techniques is appropriate and has practically no difference in response time. We’ve eliminated the central bottleneck and that tends to be Good Enough for eliminating overload.</p>
<h2 id="isnt-dropping-traffic-kinda-bad">Isn’t dropping traffic kinda bad?</h2>
<p>Dropping traffic isn’t ideal. And that’s why we’re only going to drop as much traffic as required to keep
the services healthy. Keep in mind, the alternative to not dropping <em>any</em> traffic
is that we drop <em>all</em> traffic. Furthermore, “dropping traffic” in this case doesn’t
have to mean we return a 500. We’re just trying to get the work time as low as
we can. Sometimes it’ll be appropriate to return an error. But a lot of the time
we can respond with cached or stale values. Perhaps we allow part of the request
to fail and just return a degraded response. Having tools like Regulator provides
us with options. We can make concious choices about how we allow the system to degrade.</p>
<h2 id="where-should-i-use-regulators">Where should I use regulators?</h2>
<p>The short answer here is:</p>
<p>1) On any outbound API calls
2) In front of every service</p>
<p>BRPC provides examples of how to do both of these. Each service should protect itself with a regulator to avoid unbounded queue growth. Likewise, upstream services should avoid calling downstream services if they become overloaded and thus should fail as quickly as possible and avoid doing useless work.</p>
<p>Empirically, we’ve found that AIMD works well for outbound calls and Gradient works well for services. But, I think it’s worth questioning these ideas and fiddling around with them. It might be that we should be using AIMD everywhere, for instance. Both of these algorithms can be tweaked and tuned in various ways (refer to the regulator docs for details) but they both work on similar principles. They look at samples to determine the overall health of the system. Once they’ve made that determination they adjust how much work the system is allowed to do to avoid overload.</p>
<h2 id="so-what-did-we-freaking-learn-here">So what did we freaking learn here?</h2>
<p>I hope that this gives you some intuition about queues and how to protect your system from overload. The Regulator docs provide more specifics on how each limiter works and the various ways that it can be tuned.</p>
<p>If you want to learn more about this here’s a list of stuff to check out:</p>
<ul>
<li>I gave a talk on regulator specifically at CodeBeam: https://youtu.be/-oQl1xv0hDk</li>
<li>Stop Rate Limiting! https://www.youtube.com/watch?v=m64SWl9bfvk</li>
<li>Netflix: Performance Under Load - https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581</li>
</ul>Chris KeathleyI originally wrote this for the backend engineers at Bleacher Report. I thought that it might be useful to others to repost it here. I’ve obfuscated the names of the specific services but otherwise left it as is.Telemetry Conventions2020-07-20T10:00:00+00:002020-07-20T10:00:00+00:00http://keathley.github.io/blog/telemetry-conventions<p>I’m a big fan of telemetry. It’s arguably the most important elixir
project released in the past few years. Most of the mainstream libraries
have started to adopt it, and that’s a good thing. But, there’s still
a lot of inconsistency in how telemetry is used across projects. I thought
it would be good to write up some of the conventions that I’ve been using.</p>
<p>I’m treating this as a living document. I expect that things may change
and I’ll try to capture those changes here.</p>
<h2 id="keep-your-names-consistent">Keep your names consistent</h2>
<p>Your events should all follow a naming scheme like: <code class="language-plaintext highlighter-rouge">[:my_lib, :function_call, ...]</code>.
Do not allow users to customize the event names in any way and don’t change them based
on whatever module is <code class="language-plaintext highlighter-rouge">use</code>-ing your library. If you need to differentiate
between multiple instances of your library, you should provide that
information in the event’s metadata.</p>
<p>Keeping your event names consistent makes it trivial for monitoring tools to
start capturing your events and exporting them as time-series, logs, APM, or
whatever else.</p>
<h2 id="use-spans">Use spans</h2>
<p>The telemetry events you produce should be usable in several contexts. One
of the best ways to do this is to use “spans”. The notion of a span is
straightforward. When you start a function call you execute a <code class="language-plaintext highlighter-rouge">[:lib,
function, :start]</code> event. When you finish the function call you execute
a <code class="language-plaintext highlighter-rouge">[:lib, :function, :stop]</code> event. If something inside the function call
raises or throws you execute a <code class="language-plaintext highlighter-rouge">[:lib, :function, :exception]</code> event.</p>
<p>These 3 events will cover at least 90% of your user’s needs. If your
consumer wants to support APM or tracing they can do that by listening to
all events. If they just want to emit time series, they only need to
listen to the stop and exception events.</p>
<p>There are times when spans won’t be enough, and when that happens feel
free to execute a one-off event. Otherwise, just use spans.</p>
<h2 id="add-errors-to-your-stop-events">Add errors to your stop events</h2>
<p>Sometimes things go poorly inside of a function call that doesn’t lead
to an exception. When this happens you should include the error to your
events metadata inside an optional <code class="language-plaintext highlighter-rouge">:error</code> key. Consumers can use this to
add labels to their time-series or add errors to their traces.</p>
<h2 id="give-me-all-your-metadata">Give me all your metadata</h2>
<p>You don’t know what people are going to do with the events that you’re
executing. You want to support as many use cases as you can (including all
of the use cases you haven’t thought of yet). So lean towards providing
more metadata in your events than you think you need to.</p>
<h2 id="allow-users-to-add-more-metadata">Allow users to add more metadata</h2>
<p>Speaking of metadata, its totally reasonable for you to allow users to add
additional information to an events metadata. This is often useful for users who
want to add additional context or business related metrics to each event.</p>
<h2 id="dont-rely-on-middleware-to-emit-your-events">Don’t rely on middleware to emit your events</h2>
<p>A lot of libraries use middleware to emit telemetry events. I have some feedback on this pattern:</p>
<p>Stop it.</p>
<p>Unless there’s no other way to provide telemetry, you should be executing
your events from your core library code. The only reasons to use
middleware would be for users to opt-in to telemetry or for customization
of your telemetry event names. But, telemetry is <em>already</em> opt-in. Users have
to attach handlers to your events and executing an event with no handler is low-cost.</p>
<p>Allowing users to customize the event names isn’t something you should do,
as we’ve already discussed.</p>
<p>A major problem with emiting telemetry in middleware is that it necessarily means
that you aren’t getting the full trace. You won’t be capturing the time
between your library being called and your library calling the telemetry middleware.
This problem gets worse if the user happens to place the middleware after
other, potentially expensive, middleware. The end result is a solution that is
less precise and error prone. If you have no other solution, by all means, provide
a middleware. But otherwise, avoid it.</p>
<h2 id="durations-should-be-in-native-units-or-explicitly-stated">Durations should be in native units (or explicitly stated)</h2>
<p>You should default to native units for all of your duration measurements.
If you <em>really</em> don’t want to use native units, then return a tuple
stating exactly what units you’re using like: <code class="language-plaintext highlighter-rouge">{100, :microseconds}</code>.</p>
<h2 id="use-a-single-module-for-telemetry-and-include-all-of-your-context-and-docs-in-that-module">Use a single module for telemetry and include all of your context and docs in that module</h2>
<p>All of my projects include a module called <code class="language-plaintext highlighter-rouge">Lib.Telemetry</code>, and they all follow the same pattern:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">LibName</span><span class="o">.</span><span class="no">Telemetry</span> <span class="k">do</span>
<span class="nv">@moduledoc</span> <span class="sd">"""
Description of all events
"""</span>
<span class="nv">@doc</span> <span class="no">false</span>
<span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">meta</span><span class="p">,</span> <span class="n">measurements</span> <span class="p">\\</span> <span class="p">%{})</span> <span class="k">do</span>
<span class="n">time</span> <span class="o">=</span> <span class="no">System</span><span class="o">.</span><span class="n">monotonic_time</span><span class="p">()</span>
<span class="n">measures</span> <span class="o">=</span> <span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">measurements</span><span class="p">,</span> <span class="ss">:system_time</span><span class="p">,</span> <span class="n">time</span><span class="p">)</span>
<span class="ss">:telemetry</span><span class="o">.</span><span class="n">execute</span><span class="p">([</span><span class="ss">:app_name</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="ss">:start</span><span class="p">],</span> <span class="n">measures</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span>
<span class="n">time</span>
<span class="k">end</span>
<span class="nv">@doc</span> <span class="no">false</span>
<span class="k">def</span> <span class="n">stop</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">start_time</span><span class="p">,</span> <span class="n">meta</span><span class="p">,</span> <span class="n">measurements</span> <span class="p">\\</span> <span class="p">%{})</span> <span class="k">do</span>
<span class="n">end_time</span> <span class="o">=</span> <span class="no">System</span><span class="o">.</span><span class="n">monotonic_time</span><span class="p">()</span>
<span class="n">measurements</span> <span class="o">=</span> <span class="no">Map</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">measurements</span><span class="p">,</span> <span class="p">%{</span><span class="ss">duration:</span> <span class="n">end_time</span> <span class="o">-</span> <span class="n">start_time</span><span class="p">})</span>
<span class="ss">:telemetry</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span>
<span class="p">[</span><span class="ss">:app_name</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="ss">:stop</span><span class="p">],</span>
<span class="n">measurements</span><span class="p">,</span>
<span class="n">meta</span>
<span class="p">)</span>
<span class="k">end</span>
<span class="nv">@doc</span> <span class="no">false</span>
<span class="k">def</span> <span class="n">exception</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">start_time</span><span class="p">,</span> <span class="n">kind</span><span class="p">,</span> <span class="n">reason</span><span class="p">,</span> <span class="n">stack</span><span class="p">,</span> <span class="n">meta</span> <span class="p">\\</span> <span class="p">%{},</span> <span class="n">extra_measurements</span> <span class="p">\\</span> <span class="p">%{})</span> <span class="k">do</span>
<span class="n">end_time</span> <span class="o">=</span> <span class="no">System</span><span class="o">.</span><span class="n">monotonic_time</span><span class="p">()</span>
<span class="n">measurements</span> <span class="o">=</span> <span class="no">Map</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">extra_measurements</span><span class="p">,</span> <span class="p">%{</span><span class="ss">duration:</span> <span class="n">end_time</span> <span class="o">-</span> <span class="n">start_time</span><span class="p">})</span>
<span class="n">meta</span> <span class="o">=</span>
<span class="n">meta</span>
<span class="o">|></span> <span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="ss">:kind</span><span class="p">,</span> <span class="n">kind</span><span class="p">)</span>
<span class="o">|></span> <span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="ss">:error</span><span class="p">,</span> <span class="n">reason</span><span class="p">)</span>
<span class="o">|></span> <span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="ss">:stacktrace</span><span class="p">,</span> <span class="n">stack</span><span class="p">)</span>
<span class="ss">:telemetry</span><span class="o">.</span><span class="n">execute</span><span class="p">([</span><span class="ss">:app_name</span><span class="p">,</span> <span class="n">event</span><span class="p">,</span> <span class="ss">:exception</span><span class="p">],</span> <span class="n">measurements</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span>
<span class="k">end</span>
<span class="nv">@doc</span> <span class="no">false</span>
<span class="k">def</span> <span class="n">event</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">metrics</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:telemetry</span><span class="o">.</span><span class="n">execute</span><span class="p">([</span><span class="ss">:app_name</span><span class="p">,</span> <span class="n">name</span><span class="p">],</span> <span class="n">metrics</span><span class="p">,</span> <span class="n">meta</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This keeps all of my other code relatively easy to read and provides
a module where I can add docs for all of the events I’m going to emit.</p>
<p>Speaking of docs, you need to write some. For each event, explain what
measurements you’re going to return, what metadata you’re going to return,
and in what context the specific event is going to be executed.</p>
<h2 id="test-your-events">Test your events</h2>
<p>Your telemetry is an API and breaking it is probably more costly than if
you break some sort of functional interface. At least if you break your
functions the user of your library is likely to notice it before they
deploy to production. If you make backward-incompatible changes to your
telemetry events, the user probably has no clue and won’t discover it until
they’ve deployed to production and realize that their monitors and dashboards
are now broken.</p>
<p>Luckily, it’s pretty straightforward to test your telemetry events. I typically
do something like this (which iirc. is a pattern I stole from Redix’s test suite).</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">test</span> <span class="s2">"telemetry events"</span> <span class="k">do</span>
<span class="p">{</span><span class="n">test_name</span><span class="p">,</span> <span class="n">_arity</span><span class="p">}</span> <span class="o">=</span> <span class="n">__ENV__</span><span class="o">.</span><span class="n">function</span>
<span class="n">parent</span> <span class="o">=</span> <span class="n">self</span><span class="p">()</span>
<span class="n">ref</span> <span class="o">=</span> <span class="n">make_ref</span><span class="p">()</span>
<span class="n">handler</span> <span class="o">=</span> <span class="k">fn</span> <span class="n">event</span><span class="p">,</span> <span class="n">measurements</span><span class="p">,</span> <span class="n">meta</span><span class="p">,</span> <span class="n">_config</span> <span class="o">-></span>
<span class="n">assert</span> <span class="n">event</span> <span class="o">==</span> <span class="p">[</span><span class="ss">:your_app</span><span class="p">,</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:start</span><span class="p">]</span>
<span class="n">assert</span> <span class="n">is_integer</span><span class="p">(</span><span class="n">measurements</span><span class="o">.</span><span class="n">system_time</span><span class="p">)</span>
<span class="n">send</span><span class="p">(</span><span class="n">parent</span><span class="p">,</span> <span class="p">{</span><span class="n">ref</span><span class="p">,</span> <span class="ss">:start</span><span class="p">})</span>
<span class="k">end</span>
<span class="ss">:telemetry</span><span class="o">.</span><span class="n">attach_many</span><span class="p">(</span><span class="n">to_string</span><span class="p">(</span><span class="n">test_name</span><span class="p">),</span>
<span class="p">[</span>
<span class="p">[</span><span class="ss">:your_app</span><span class="p">,</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:start</span><span class="p">],</span>
<span class="p">],</span>
<span class="n">handler</span><span class="p">,</span>
<span class="no">nil</span>
<span class="p">)</span>
<span class="c1"># some function call...</span>
<span class="n">assert_receive</span> <span class="p">{</span><span class="o">^</span><span class="n">ref</span><span class="p">,</span> <span class="ss">:start</span><span class="p">}</span>
<span class="n">assert_receive</span> <span class="p">{</span><span class="o">^</span><span class="n">ref</span><span class="p">,</span> <span class="ss">:stop</span><span class="p">}</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>I hope that this provides a good framework for anyone who wants to add
telemetry to their libraries or applications. As library authors, we need to
view good telemetry the same way we view good docs or good tests. These things
matter and they can dramatically enhance the experience of using your library.
Hopefully, we can spread these ideas across the ecosystem.</p>Chris KeathleyI’m a big fan of telemetry. It’s arguably the most important elixir project released in the past few years. Most of the mainstream libraries have started to adopt it, and that’s a good thing. But, there’s still a lot of inconsistency in how telemetry is used across projects. I thought it would be good to write up some of the conventions that I’ve been using.Reusable Elixir Libraries2020-02-08T10:00:00+00:002020-02-08T10:00:00+00:00http://keathley.github.io/blog/reusable-libraries<p>One of my new goals is to try to make my elixir libraries more reusable. It’s an easy mark to hit if you only use modules and functions. But once you start adding processes, ETS tables, and other stateful constructs, the solutions get murky.</p>
<p>I thought it would be good to write out my thoughts and explain some of the patterns that I’ve been using. There are probably other, better solutions. But these are the ones that I use. I’m going to use the term “library” throughout this post, but none of these techniques are limited to libraries in the traditional sense. I use all of these methods when building components or subsystems at work.</p>
<h2 id="otp-applications">OTP Applications</h2>
<p>This solution is the easiest but also the most limiting. If you only provide an OTP application, then your users don’t have to worry about configuring
anything, and the API is typically more straightforward. But OTP Apps are singletons. Configuration becomes much more complicated, the user has limited control, and you risk colliding with other libraries who are also dependent on your app. But an OTP app’s most significant drawbacks are also its biggest strengths. There might not be any <em>need</em> for the user to provide configuration. Maybe the supervision strategy is complex, and it would be error-prone to ask the user to manage it themselves. You need to look at your objectives and decide the best approach.</p>
<p>Anecdotally, the majority of times that I’ve built a library that only provided an OTP app, I’ve ended up changing it. But that probably says more about me than it says anything about OTP apps.</p>
<h2 id="starting-with-a-single-process">Starting with a single process</h2>
<p>My typical approach is to provide processes that the user can start in their supervision tree. This pattern takes more work, but it isolates the component from the rest of the system and gives more control to the user of the library.</p>
<p>To make this concrete, we can look at an example. Let’s say that we want to provide a small cache that users can include in their supervision tree. A naive implementation might look like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Cache</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="k">def</span> <span class="n">child_spec</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%{</span>
<span class="ss">id:</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">]</span> <span class="o">||</span> <span class="bp">__MODULE__</span><span class="p">,</span>
<span class="ss">start:</span> <span class="p">{</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="ss">:start_link</span><span class="p">,</span> <span class="p">[</span><span class="n">opts</span><span class="p">]},</span>
<span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">server_opts</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">take</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="p">[</span><span class="ss">:name</span><span class="p">])</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">opts</span><span class="p">,</span> <span class="n">server_opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="p">{</span><span class="ss">:get</span><span class="p">,</span> <span class="n">key</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">put</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="p">{</span><span class="ss">:put</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="p">%{</span><span class="ss">kvs:</span> <span class="p">%{},</span> <span class="ss">opts:</span> <span class="n">opts</span><span class="p">}}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">({</span><span class="ss">:get</span><span class="p">,</span> <span class="n">key</span><span class="p">},</span> <span class="n">_from</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">kvs</span><span class="p">[</span><span class="n">key</span><span class="p">],</span> <span class="n">data</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">({</span><span class="ss">:put</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span><span class="p">},</span> <span class="n">_from</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:ok</span><span class="p">,</span> <span class="n">put_in</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="p">[</span><span class="ss">:kvs</span><span class="p">,</span> <span class="n">key</span><span class="p">],</span> <span class="n">val</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>That’s it! That’s the entire trick. We simply rely on the name
registration rules that other OTP processes use. Our users are now free
to start a cache however they want.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Access with pid</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">pid</span><span class="p">}</span> <span class="o">=</span> <span class="no">Cache</span><span class="o">.</span><span class="n">start_link</span><span class="p">([])</span>
<span class="no">Cache</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">,</span> <span class="s2">"foo"</span><span class="p">)</span>
<span class="mi">123</span> <span class="o">=</span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="c1"># Start a process with a name.</span>
<span class="no">Cache</span><span class="o">.</span><span class="n">start_link</span><span class="p">([</span><span class="ss">name:</span> <span class="no">MyCache</span><span class="p">])</span>
<span class="no">Cache</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="no">MyCache</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">,</span> <span class="s2">"foo"</span><span class="p">)</span>
<span class="mi">123</span> <span class="o">=</span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="no">MyCache</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">)</span>
</code></pre></div></div>
<p>Unit testing is simple and isolated.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">CacheTest</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">ExUnit</span><span class="o">.</span><span class="no">Case</span><span class="p">,</span> <span class="ss">async:</span> <span class="no">true</span>
<span class="n">setup</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">cache</span><span class="p">}</span> <span class="o">=</span> <span class="no">Cache</span><span class="o">.</span><span class="n">start_link</span><span class="p">([])</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="ss">cache:</span> <span class="n">cache</span><span class="p">}</span>
<span class="k">end</span>
<span class="n">test</span> <span class="s2">"it stores values"</span><span class="p">,</span> <span class="p">%{</span><span class="ss">cache:</span> <span class="n">cache</span><span class="p">}</span> <span class="k">do</span>
<span class="n">assert</span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">cache</span><span class="p">,</span> <span class="ss">:key</span><span class="p">)</span> <span class="o">==</span> <span class="no">nil</span>
<span class="n">assert</span> <span class="no">Cache</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">cache</span><span class="p">,</span> <span class="ss">:key</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">)</span> <span class="o">==</span> <span class="ss">:ok</span>
<span class="n">assert</span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">cache</span><span class="p">,</span> <span class="ss">:key</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"value"</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And if the user wants to start multiple instances of the cache, they’re free to do so.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">CacheExample</span><span class="o">.</span><span class="no">Application</span> <span class="k">do</span>
<span class="nv">@moduledoc</span> <span class="no">false</span>
<span class="kn">use</span> <span class="no">Application</span>
<span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">Cache</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">PrimaryCache</span><span class="p">,</span> <span class="ss">ttl:</span> <span class="mi">500</span><span class="p">},</span>
<span class="p">{</span><span class="no">Cache</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">BackupCache</span><span class="p">,</span> <span class="ss">ttl:</span> <span class="mi">5_000</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">CacheExample</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="providing-a-supervision-tree">Providing a supervision tree</h2>
<p>This strategy is obvious when you only need to provide a single process.
But if you need to provide a set of processes with a supervisor, then
things get more complicated.</p>
<p>For instance, if we wanted to provide a more robust cache, then we’d
want to use an ETS table. We could start the ETS table inside of our cache process, but if the cache process crashes, we’ll also lose the ETS table. A better approach would be to start both the ETS table and the writing process underneath a supervisor like so.</p>
<p><a href="/assets/images/reusablelibs/supervision_tree.jpg">
<img src="/assets/images/reusablelibs/supervision_tree.jpg" alt="supervision tree" />
</a></p>
<p>The problem with this approach is that its difficult for the
supervisor’s children to identify and communicate with one another. There
are some smart ways we could solve the problem, but my preference is to
do something dumb and easy.</p>
<p>We’re going to require that users pass in a <code class="language-plaintext highlighter-rouge">:name</code> when they start a cache. We’ll then used the passed in name to derive names for the supervisors children. By naming all of the processes in this way the siblings will all be able to find each other. This requirement reduces our flexibility, but in my experience, it’s a reasonable tradeoff to make.</p>
<p>We’ll start by converting our API to a supervisor.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Cache</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Supervisor</span>
<span class="k">def</span> <span class="n">child_spec</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%{</span>
<span class="ss">id:</span> <span class="p">(</span><span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">]</span> <span class="o">||</span> <span class="k">raise</span> <span class="no">ArgumentError</span><span class="p">,</span> <span class="s2">"Cache name is required"</span><span class="p">),</span>
<span class="ss">start:</span> <span class="p">{</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="ss">:start_link</span><span class="p">,</span> <span class="p">[</span><span class="n">opts</span><span class="p">]},</span>
<span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">]</span> <span class="o">||</span> <span class="k">raise</span> <span class="no">ArgumentError</span><span class="p">,</span> <span class="s2">"Cache name is required"</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">opts</span><span class="p">,</span> <span class="ss">name:</span> <span class="n">name</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span> <span class="k">do</span>
<span class="no">Cache</span><span class="o">.</span><span class="no">Storage</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="n">key</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">put</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span><span class="p">)</span> <span class="k">do</span>
<span class="no">Cache</span><span class="o">.</span><span class="no">Storage</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">]</span>
<span class="n">table</span> <span class="o">=</span> <span class="ss">:ets</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="p">[</span><span class="ss">:named_table</span><span class="p">,</span> <span class="ss">:public</span><span class="p">,</span> <span class="ss">:set</span><span class="p">])</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">Cache</span><span class="o">.</span><span class="no">Storage</span><span class="p">,</span> <span class="p">[</span><span class="ss">name:</span> <span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">)]},</span>
<span class="p">]</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">init</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:"</span><span class="si">#{</span><span class="n">name</span><span class="si">}</span><span class="ss">.Storage"</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The supervisor ensures that the user has provided a name; if they haven’t, it raises an error. It then creates an ETS table and starts a <code class="language-plaintext highlighter-rouge">Storage</code> process as a worker. Both the <code class="language-plaintext highlighter-rouge">Storage</code> worker and the ETS table are given the same name. This symmetry reduces complexity in the storage worker and keeps all of the naming logic inside the supervisor.</p>
<p>We can move all of our old Cache logic into the <code class="language-plaintext highlighter-rouge">Storage</code> module and make a few tweaks.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Cache</span><span class="o">.</span><span class="no">Storage</span> <span class="k">do</span>
<span class="nv">@moduledoc</span> <span class="no">false</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="k">def</span> <span class="n">child_spec</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%{</span>
<span class="ss">id:</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">],</span>
<span class="ss">start:</span> <span class="p">{</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="ss">:start_link</span><span class="p">,</span> <span class="p">[</span><span class="n">opts</span><span class="p">]},</span>
<span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">opts</span><span class="p">,</span> <span class="ss">name:</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">])</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span> <span class="k">do</span>
<span class="k">case</span> <span class="ss">:ets</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span> <span class="k">do</span>
<span class="p">[{</span><span class="o">^</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">}]</span> <span class="o">-></span>
<span class="n">value</span>
<span class="p">[]</span> <span class="o">-></span>
<span class="no">nil</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">put</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">server</span><span class="p">,</span> <span class="p">{</span><span class="ss">:put</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="p">%{</span><span class="ss">table:</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">]}}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">({</span><span class="ss">:put</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span><span class="p">},</span> <span class="n">_from</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="no">true</span> <span class="o">=</span> <span class="ss">:ets</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">table</span><span class="p">,</span> <span class="p">{</span><span class="n">key</span><span class="p">,</span> <span class="n">val</span><span class="p">})</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:ok</span><span class="p">,</span> <span class="n">data</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>These changes aren’t too dramatic. All “gets” go directly to the ETS
table and “puts” go to the storage process. It may seem awkward to split reads and writes this way. In some cases, it might make more sense to have the client send writes and reads directly to ETS and skip the process. Or invert the logic and have everything go through a process. I use the split approach for read heavy workloads because it makes it easier to implement logic like key eviction or CAS operations.</p>
<p>With those changes done, we’ve successfully isolated our errors. If the storage process crashes, we won’t lose the values in our ETS table.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iex</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span><span class="o">></span> <span class="no">Cache</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="no">PrimaryCache</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span><span class="o">></span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="no">PrimaryCache</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="s2">"bar"</span>
<span class="n">iex</span><span class="p">(</span><span class="mi">7</span><span class="p">)</span><span class="o">></span> <span class="no">Process</span><span class="o">.</span><span class="n">whereis</span><span class="p">(</span><span class="no">PrimaryCache</span><span class="o">.</span><span class="no">Storage</span><span class="p">)</span> <span class="o">|></span> <span class="no">Process</span><span class="o">.</span><span class="k">exit</span><span class="p">(</span><span class="ss">:brutal_kill</span><span class="p">)</span>
<span class="no">true</span>
<span class="n">iex</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span><span class="o">></span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="no">PrimaryCache</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="s2">"bar"</span>
</code></pre></div></div>
<p>This pattern also makes it simple to extend the system in the future. For instance, if we wanted to create a process to clean up old keys, we could add it to our existing supervision tree and name it correctly.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">]</span>
<span class="n">table</span> <span class="o">=</span> <span class="ss">:ets</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="p">[</span><span class="ss">:named_table</span><span class="p">,</span> <span class="ss">:public</span><span class="p">,</span> <span class="ss">:set</span><span class="p">])</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">Cache</span><span class="o">.</span><span class="no">Storage</span><span class="p">,</span> <span class="p">[</span><span class="ss">name:</span> <span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">)]},</span>
<span class="p">{</span><span class="no">Cache</span><span class="o">.</span><span class="no">Cleaner</span><span class="p">,</span> <span class="p">[</span><span class="ss">name:</span> <span class="n">cleaner_name</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="ss">table:</span> <span class="n">table</span><span class="p">,</span> <span class="ss">ttl:</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:ttl</span><span class="p">]]},</span>
<span class="p">]</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">init</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">storage_name</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:"</span><span class="si">#{</span><span class="n">name</span><span class="si">}</span><span class="ss">.Storage"</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">cleaner_name</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:"</span><span class="si">#{</span><span class="n">name</span><span class="si">}</span><span class="ss">.Cleaner"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>At this point, we’ve built a stateful library that is re-usable in
multiple contexts. Users can choose to configure it and supervise it
whichever way they feel best. I would usually stop here. But there’s one more step we can take to make our API more pleasant to use.</p>
<h2 id="improving-the-user-experience">Improving the user experience</h2>
<p>Every time a user calls our cache, they have to pass the name of the cache as the first argument, which can quickly become tedious. A lot of people find <code class="language-plaintext highlighter-rouge">PrimaryCache.get(:foo)</code> more appealing than <code class="language-plaintext highlighter-rouge">Cache.get(PrimaryCache, :foo)</code> and who am I to tell them they’re wrong.</p>
<p>Fortunately, our design makes this easy to add. We just need a little help from our venerable friend, the <code class="language-plaintext highlighter-rouge">__using__</code> macro.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Cache</span> <span class="k">do</span>
<span class="k">defmacro</span> <span class="n">__using__</span><span class="p">(</span><span class="n">_opts</span><span class="p">)</span> <span class="k">do</span>
<span class="kn">quote</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">child_spec</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">opts</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">put_new</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:name</span><span class="p">,</span> <span class="bp">__MODULE__</span><span class="p">)</span>
<span class="no">Cache</span><span class="o">.</span><span class="n">child_spec</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">opts</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">put_new</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:name</span><span class="p">,</span> <span class="bp">__MODULE__</span><span class="p">)</span>
<span class="no">Cache</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">),</span> <span class="k">do</span><span class="p">:</span> <span class="no">Cache</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span>
<span class="k">def</span> <span class="n">put</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">),</span> <span class="k">do</span><span class="p">:</span> <span class="no">Cache</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># The functions we already wrote...</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The macro defines some default functions that start and access a cache based on the name of the module. The user can then add their cache module to their tree similar to before.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">CacheExample</span><span class="o">.</span><span class="no">PrimaryCache</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Cache</span>
<span class="k">end</span>
<span class="k">defmodule</span> <span class="no">CacheExample</span><span class="o">.</span><span class="no">Application</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">CacheExample</span><span class="o">.</span><span class="no">PrimaryCache</span><span class="p">,</span> <span class="ss">ttl:</span> <span class="mi">500</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">CacheExample</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>I hope that this has given you some ideas about how to build more reusable, stateful libraries. There are tons more out there, all with different tradeoffs. But most of the time, these simple approaches are all you need. Regardless of which solution you choose, I hope this demonstrates that you can provide a friendly API, which gives users more control and doesn’t give up isolation. If you follow these patterns
your APIs are going to be more reusable and will provide a better
foundation for others to build on.</p>Chris KeathleyOne of my new goals is to try to make my elixir libraries more reusable. It’s an easy mark to hit if you only use modules and functions. But once you start adding processes, ETS tables, and other stateful constructs, the solutions get murky.Open Source is Not About You2020-01-18T13:32:00+00:002020-01-18T13:32:00+00:00http://keathley.github.io/blog/open-source-is-not-about-you<p>Rich Hickey posted <a href="https://gist.github.com/richhickey/1563cddea1002958f96e7ba9519972d9">this gist</a>
back in 2018 about the entitlement of people who use open-source software. I’m not going to re-iterate his points and instead suggest that you give it a read. But, I’ll leave you with one of my favorite quotes:</p>
<blockquote>
<p>I encourage everyone gnashing their teeth with negativity at what they
think they can’t do instead pick something positive they can do and do it.</p>
</blockquote>Chris KeathleyRich Hickey posted this gist back in 2018 about the entitlement of people who use open-source software. I’m not going to re-iterate his points and instead suggest that you give it a read. But, I’ll leave you with one of my favorite quotes:Going back to RSS2020-01-17T09:09:00+00:002020-01-17T09:09:00+00:00http://keathley.github.io/blog/rss-is-still-great<p>In the middle of 2019, I rediscovered RSS. I see you rolling your eyes; how could you possibly forget about RSS? I suppose I’d just gotten lazy. I’d allowed Twitter or some crappy news aggregator to dictate what I was reading. But, considering how I could become a more discerning consumer, it occurred to me that RSS hadn’t gone anywhere, and I should start using it again.</p>
<p>I’m aware of the power of nostalgia, and I’m sure it’s at play here. But I don’t care. I miss the weird internet before Google Reader ruined everything. And, most importantly, reading my favorite blogs makes me happy.</p>Chris KeathleyIn the middle of 2019, I rediscovered RSS. I see you rolling your eyes; how could you possibly forget about RSS? I suppose I’d just gotten lazy. I’d allowed Twitter or some crappy news aggregator to dictate what I was reading. But, considering how I could become a more discerning consumer, it occurred to me that RSS hadn’t gone anywhere, and I should start using it again.Runtime Configuration in Elixir Apps2020-01-09T08:53:00+00:002020-01-09T08:53:00+00:00http://keathley.github.io/blog/vapor-and-configuration<p>I gave a <a href="https://keathley.io/talks/stacking.html">talk last year</a> about
how to properly boot elixir applications. In the talk, I showed how to load configuration values into an ETS table on boot, and this was the same pattern that I used initially in Vapor. I now think that this is a bad idea.</p>
<p>The ideal way to configure all of your children processes looks like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">MyApp</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Application</span>
<span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">Database</span><span class="p">,</span> <span class="p">[</span><span class="ss">db_host:</span> <span class="s2">"host"</span><span class="p">,</span> <span class="ss">db_name:</span> <span class="s2">"blog_posts"</span><span class="p">]},</span>
<span class="p">{</span><span class="no">Api</span><span class="p">,</span> <span class="ss">port:</span> <span class="mi">4000</span><span class="p">},</span>
<span class="p">{</span><span class="no">Cache</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyCache</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span>
<span class="n">status</span> <span class="o">=</span> <span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The user passes all of the configuration values to each child process as arguments. The child processes are completely re-usable; I can choose to start as many of them as I want in whatever way I want.</p>
<p>If you followed my (bad) advice in the talk, then you would have ended up in a situation like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="no">ConfigStore</span><span class="p">,</span>
<span class="no">Database</span><span class="p">,</span>
<span class="no">Api</span><span class="p">,</span>
<span class="p">{</span><span class="no">Cache</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyCache</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span>
<span class="n">status</span> <span class="o">=</span> <span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The config won’t be loaded until the <code class="language-plaintext highlighter-rouge">ConfigStore</code> process has started.
This delay means it’s not possible to pass configuration down as arguments, and each child needs to fetch configuration when it starts like so:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Database</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="k">do</span>
<span class="n">db_port</span> <span class="o">=</span> <span class="no">ConfigStore</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="ss">:db_port</span><span class="p">)</span>
<span class="n">db_name</span> <span class="o">=</span> <span class="no">ConfigStore</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="ss">:db_name</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="p">[</span><span class="ss">db_port:</span> <span class="n">db_port</span><span class="p">,</span> <span class="ss">db_name:</span> <span class="n">db_name</span><span class="p">]}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Fetching config in <code class="language-plaintext highlighter-rouge">init</code> only works if you have control over the modules <code class="language-plaintext highlighter-rouge">init</code> callback. If you wrote the module, then you can do what you want. If the process came from a library, then you’re probably limited. But, even if you could override <code class="language-plaintext highlighter-rouge">init</code>, you shouldn’t. Fetching config in the <code class="language-plaintext highlighter-rouge">init</code> callback couples the process to the configuration provider, which, in effect, couples the process to how you boot your application. None of this is good.</p>
<p>You could start your <code class="language-plaintext highlighter-rouge">ConfigStore</code> in the application start, which would allow you to pass arguments again.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span>
<span class="no">ConfigStore</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">Database</span><span class="p">,</span> <span class="p">[</span>
<span class="ss">db_host:</span> <span class="no">ConfigStore</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="ss">:db_host</span><span class="p">),</span>
<span class="ss">db_name:</span> <span class="no">ConfigStore</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="ss">:db_name</span><span class="p">)]},</span>
<span class="p">{</span><span class="no">Api</span><span class="p">,</span> <span class="ss">port:</span> <span class="no">ConfigStore</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="ss">:web_port</span><span class="p">)},</span>
<span class="p">{</span><span class="no">Cache</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyCache</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span>
<span class="n">status</span> <span class="o">=</span> <span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>But now we’ve lost the ability to control the lifecycle of the config process. We’ve lost our ability to restart or recover from exceptions. If we make the mistake of linking the <code class="language-plaintext highlighter-rouge">ConfigStore</code> to our application process, then we could crash the entire app.</p>
<h2 id="whats-the-goal">What’s the goal?</h2>
<p>All of our children processes should be configurable by passing arguments to them. We shouldn’t couple them to any global configuration system (this includes Application env).</p>
<p>When loading configuration, we need to enforce that all of the required
config values are present. If anything is missing or doesn’t conform
correctly, the user should be free to halt the boot process or trigger an
alarm. Those are the goals.</p>
<h2 id="vapor">Vapor</h2>
<p><a href="https://github.com/keathley/vapor">Vapor</a> is a library that I’ve been
toying with to try to encapsulate these patterns. The latest version includes some breakages, but I think they’re for the better. Like most design problems, the real solution was to do fewer things. With the latest version of Vapor you’ll be able to do this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">MyApp</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Application</span>
<span class="k">def</span> <span class="n">config!</span><span class="p">()</span> <span class="k">do</span>
<span class="n">providers</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">%</span><span class="no">Env</span><span class="p">{</span><span class="ss">bindings:</span> <span class="p">[</span>
<span class="ss">db_host:</span> <span class="s2">"DB_HOST"</span><span class="p">,</span>
<span class="ss">db_name:</span> <span class="s2">"DB_NAME"</span><span class="p">,</span>
<span class="ss">web_port:</span> <span class="s2">"PORT"</span><span class="p">,</span>
<span class="p">]</span>
<span class="p">]</span>
<span class="no">Vapor</span><span class="o">.</span><span class="n">load!</span><span class="p">(</span><span class="n">providers</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">start</span><span class="p">(</span><span class="n">_type</span><span class="p">,</span> <span class="n">_args</span><span class="p">)</span> <span class="k">do</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">config!</span><span class="p">()</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span><span class="no">Database</span><span class="p">,</span> <span class="p">[</span><span class="ss">db_host:</span> <span class="n">config</span><span class="o">.</span><span class="n">db_host</span><span class="p">,</span> <span class="ss">db_name:</span> <span class="n">config</span><span class="o">.</span><span class="n">db_name</span><span class="p">]},</span>
<span class="p">{</span><span class="no">Api</span><span class="p">,</span> <span class="ss">port:</span> <span class="n">config</span><span class="o">.</span><span class="n">web_port</span><span class="p">},</span>
<span class="p">{</span><span class="no">Cache</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyCache</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">opts</span> <span class="o">=</span> <span class="p">[</span><span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">MyApp</span><span class="o">.</span><span class="no">Supervisor</span><span class="p">]</span>
<span class="n">status</span> <span class="o">=</span> <span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Vapor allows users to specify a list of providers, and the configuration
values the provider should return. It then “loads” the configuration from
each provider and returns a map. In the example above, if
any of the values are missing, an exception is thrown. If throwing
exceptions isn’t your jam, there is also <code class="language-plaintext highlighter-rouge">load/2</code>, which returns the
standard ok-error-tuple. The user is free to do whatever they want
with the map. They can configure their processes once and throw it away, store it in ETS, <code class="language-plaintext highlighter-rouge">Application.put_env</code>, or whatever else.</p>
<p>There are a bunch of other features in Vapor so <a href="https://github.com/keathley/vapor">check it
out</a> if it seems interesting to you.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Even if you don’t want to use Vapor, I hope that this at least showcases
some useful patterns. Avoid coupling your processes to any global configuration. If you’re going to fetch application configuration at
runtime, then it should enforce the values that you’ve specified. Finally,
library authors: If you’re going to spawn processes, let me pass arguments to you. Don’t use application config unless there is no other option (side note: there’s always another option).</p>
<p>I have more to say about patterns for avoiding <code class="language-plaintext highlighter-rouge">Application.get_env</code> but
that deserves a separate post.</p>Chris KeathleyI gave a talk last year about how to properly boot elixir applications. In the talk, I showed how to load configuration values into an ETS table on boot, and this was the same pattern that I used initially in Vapor. I now think that this is a bad idea.The dangers of the Single Global Process2019-08-12T07:53:00+00:002019-08-12T07:53:00+00:00http://keathley.github.io/blog/sgp<p>There are a few things in the Elixir/Erlang ecosystem that I consider required reading. <a href="https://www.theerlangelist.com/article/spawn_or_not">To spawn, or not to spawn?</a> by Saša Jurić is definitely one of them. If you haven’t read it, you need to. It’ll change the way you think about building elixir applications.</p>
<p>Seriously go read it.</p>
<p>That post flipped the elixir communities’ idea of good design on its head and for a good reason. Modeling the domain with pure functions is a powerful approach and one that we should strive for when we can.</p>
<p>But, there was one pattern that emerged that I think has been misapplied as a universal solution. That pattern is what I’ve been calling - for lack of a better name - the “single global process” pattern or SGP for short. You’ve probably seen this pattern. Its the one where you do this elegant, functional domain modeling and then put it in a long-running process somewhere, effectively turning the process into a write-through cache. In Saša’s post, the single process is the RoundServer.</p>
<p>I don’t think Saša intended to promote this pattern. I wasn’t there when he was writing it, but I always thought that using a single RoundServer to manage a round was somewhat incidental. The unique process was an implementation detail, and the critical point was to model your domain with functions.</p>
<p>Because here’s the thing. The SGP pattern introduces a <em>ton</em> of problems. Problems that you, dear reader, are going to need to solve. In my experience, the SGP is one of the most intricate patterns you can introduce to your system despite being one of the easiest to build. I’m going to do my best to convince you of this by enumerating several of the problems that you’ll face as well as some potential solutions.</p>
<h2 id="the-setup">The Setup</h2>
<p>To drive these points home, we need a motivating example. I want to focus on the runtime concerns, so I’m going to reduce our problem domain to a simple counter. Here’s our functional core:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">Counter</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">new</span><span class="p">(</span><span class="n">initial</span> <span class="p">\\</span> <span class="mi">0</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%{</span><span class="ss">ops:</span> <span class="p">[],</span> <span class="ss">initial:</span> <span class="n">initial</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">incr</span><span class="p">(%{</span><span class="ss">ops:</span> <span class="n">ops</span><span class="p">}</span><span class="o">=</span><span class="n">counter</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%{</span><span class="n">counter</span> <span class="o">|</span> <span class="ss">ops:</span> <span class="p">[{</span><span class="ss">:incr</span><span class="p">,</span> <span class="mi">1</span><span class="p">}</span> <span class="o">|</span> <span class="n">ops</span><span class="p">]}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">decr</span><span class="p">(%{</span><span class="ss">ops:</span> <span class="n">ops</span><span class="p">}</span><span class="o">=</span><span class="n">counter</span><span class="p">)</span> <span class="k">do</span>
<span class="p">%{</span><span class="n">counter</span> <span class="o">|</span> <span class="ss">ops:</span> <span class="p">[{</span><span class="ss">:decr</span><span class="p">,</span> <span class="mi">1</span><span class="p">}</span> <span class="o">|</span> <span class="n">ops</span><span class="p">]}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">count</span><span class="p">(%{</span><span class="ss">ops:</span> <span class="n">ops</span><span class="p">,</span> <span class="ss">initial:</span> <span class="n">init</span><span class="p">})</span> <span class="k">do</span>
<span class="no">Enum</span><span class="o">.</span><span class="n">reduce</span> <span class="n">ops</span><span class="p">,</span> <span class="n">init</span><span class="p">,</span> <span class="k">fn</span> <span class="n">op</span><span class="p">,</span> <span class="n">count</span> <span class="o">-></span>
<span class="k">case</span> <span class="n">op</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:incr</span><span class="p">,</span> <span class="n">val</span><span class="p">}</span> <span class="o">-></span>
<span class="n">count</span> <span class="o">+</span> <span class="n">val</span>
<span class="p">{</span><span class="ss">:decr</span><span class="p">,</span> <span class="n">val</span><span class="p">}</span> <span class="o">-></span>
<span class="n">count</span> <span class="o">-</span> <span class="n">val</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>To update a count we “cons” increment and decrement operations onto a growing list. When we want to find the actual count, we fold over the list either incrementing or decrementing starting from some initial value.</p>
<p>For the server, we’ll use a <code class="language-plaintext highlighter-rouge">GenServer</code>.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">CounterServer</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">opts</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:name</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">increment</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="ss">:incr</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">decrement</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="ss">:decr</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">count</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="ss">:get_count</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">_opts</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="no">Counter</span><span class="o">.</span><span class="n">new</span><span class="p">()}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:incr</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">counter</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:ok</span><span class="p">,</span> <span class="no">Counter</span><span class="o">.</span><span class="n">incr</span><span class="p">(</span><span class="n">counter</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:decr</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">counter</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:ok</span><span class="p">,</span> <span class="no">Counter</span><span class="o">.</span><span class="n">decr</span><span class="p">(</span><span class="n">counter</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:get_count</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">counter</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="no">Counter</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="n">counter</span><span class="p">),</span> <span class="n">counter</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The server is equally trivial. It manages the lifecycle of a counter in response to calls. Unique counter servers can be started like so:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">CounterServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="ss">name:</span> <span class="ss">:some_fancy_counter</span><span class="p">)</span>
</code></pre></div></div>
<p>With these pieces finished, we have a good starting point to discuss problems with this design.</p>
<h2 id="the-problem">The problem</h2>
<p>What we have so far seems pretty good. And if all you really needed were a simple, in-memory counter, this would probably do the trick. But this is a contrived domain that I’ve intentionally kept simple, so I can focus on other things. Typically the state we deal with is <em>essential</em>. The data that makes up most company’s core domain is <em>not ephemeral</em>. People will make decisions based on these data that we’re working with. That means these data needs to be persisted. So let’s add persistence. If persisting a counter to a database bothers you, then you can tell yourself that the counter is used to bill clients for the use of your API or something.</p>
<p>The typical recommendation for persistence is to initialize the process with state from the database. When the process receives a message to write or update its internal state. These updates are applied to our internal data and then persisted. Utilizing the SGP pattern, any read requests can come directly out of memory saving yourself the database roundtrip. Let’s update our Counter Server to reflect this change:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">CounterServer</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">opts</span><span class="p">,</span> <span class="ss">name:</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:name</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">increment</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="ss">:incr</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">decrement</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="ss">:decr</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">count</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="ss">:get_count</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">_opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">%{</span>
<span class="ss">counter:</span> <span class="no">nil</span><span class="p">,</span>
<span class="ss">name:</span> <span class="n">opts</span><span class="p">[</span><span class="ss">:name</span><span class="p">],</span>
<span class="p">}</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="p">{</span><span class="ss">:continue</span><span class="p">,</span> <span class="ss">:load_state</span><span class="p">}}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_continue</span><span class="p">(</span><span class="ss">:load_state</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">initial</span><span class="p">}</span> <span class="o">=</span> <span class="n">get_from_db</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:noreply</span><span class="p">,</span> <span class="p">%{</span><span class="n">data</span> <span class="o">|</span> <span class="ss">counter:</span> <span class="no">Counter</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">initial</span><span class="p">)}}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:incr</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="n">new_counter</span> <span class="o">=</span> <span class="no">Counter</span><span class="o">.</span><span class="n">incr</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">counter</span><span class="p">)</span>
<span class="ss">:ok</span> <span class="o">=</span> <span class="n">put_in_db</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">new_counter</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:ok</span><span class="p">,</span> <span class="p">%{</span><span class="n">data</span> <span class="o">|</span> <span class="ss">counter:</span> <span class="n">new_counter</span><span class="p">}}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:decr</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="n">new_counter</span> <span class="o">=</span> <span class="no">Counter</span><span class="o">.</span><span class="n">decr</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">counter</span><span class="p">)</span>
<span class="ss">:ok</span> <span class="o">=</span> <span class="n">put_in_db</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">new_counter</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:ok</span><span class="p">,</span> <span class="p">%{</span><span class="n">data</span> <span class="o">|</span> <span class="ss">counter:</span> <span class="n">new_counter</span><span class="p">}}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:get_count</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="no">Counter</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">counter</span><span class="p">),</span> <span class="n">data</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>When the counter process starts, we load the state in a <code class="language-plaintext highlighter-rouge">handle_continue</code> callback. If we receive an increment or decrement message, we update the counter and shove the new counter into the database. Reads are returned from our in-memory representation saving us that database call.</p>
<p>This seems great! And it <em>is</em> great - right up until you need to run on more than one node. There are systems in the world that can get by running a single node and maybe your company is one of them. But most companies end up needing to run more than one node at some point whether for resiliency or to handle scale. Most of the time, we run both of these nodes behind a load balancer.</p>
<p>We’ve hit our first problem. The counter we’ve created is node-local. Assuming that we start these counters on demand, it’s only a matter of time before we end up with duplicate counters on both of our nodes.</p>
<p><a href="/assets/images/sgp/unconnected_nodes.jpg">
<img src="/assets/images/sgp/unconnected_nodes.jpg" alt="load balanced nodes" />
</a></p>
<p>If this situation occurs, then we have a high likelihood of returning incorrect counts from memory because we can’t know if another node has updated the counter in the database. We also have a high probability of overwriting the previously stored values, which means we have a high likelihood of losing data.</p>
<p>A quick aside about data integrity. Inconsistent data issues are some of the evilest bugs you’ll encounter when working with distributed systems. Bugs like these suck because there’s never a good indication that something is going wrong at the moment it’s happening. There’s no crash or stack trace to look at. Unless you’re constantly monitoring your data integrity, the only way you’ll find out that you have an issue is when you end up charging a client 10,000 dollars or -100 dollars or NaN dollars. There are ways to build eventually consistent systems. But that has to be a conscious choice.</p>
<h2 id="some-partial-solutions">Some partial solutions</h2>
<p>There are a few partial solutions to this problem. The ones that I see most people reach for are either persistent connections, sticky sessions, or some combination of the two. Unfortunately, none of these really eliminate the possibility of starting the same counter on two nodes, primarily if more than one user can interact with a counter at a time. Additionally, introducing sticky sessions is a fast way to end up with “hot” nodes due to unfair distribution of work. However, if you <em>can</em> use sticky sessions and are willing to give up on some levels of consistency than this might work for you.</p>
<p>Another partial solution is always to use <a href="https://en.wikipedia.org/wiki/Compare-and-swap">Compare and Swap</a> (CAS) operations when updating the database. Assuming your database implements a CAS correctly, you can eliminate the possibility of trampling data. But you will still return incorrect values until you do a write or find some other way to get an update from the database.</p>
<p>Neither of these solutions entirely solves the problem, but in conjunction with each other, they might be good enough for your use case.</p>
<h2 id="distributed-erlang-will-save-us-all">Distributed Erlang will save us all.</h2>
<p>Distribution is always the solution that’s begging for a problem. And there’s no better set of challenges than the ones introduced by an SGP. So let’s indulge ourselves and walk down this path for a bit.</p>
<p>I’ll assume that we’ve found a way to discover and connect our nodes together. Now that we’ve done that we need to register our counters across the cluster. For this example, I’m going to use <code class="language-plaintext highlighter-rouge">:global</code> because it’s built into OTP and easy to use. But the failures I’m about to describe are not limited to <code class="language-plaintext highlighter-rouge">:global</code>. You can induce these same failures with virtually any of the process registries that exist in elixir and erlang.</p>
<p>Converting our process to use the global registry is straight forward. We need to change the <code class="language-plaintext highlighter-rouge">start_link</code> function.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">CounterServer</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">name</span> <span class="o">=</span> <span class="no">Keyword</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:name</span><span class="p">)</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="n">opts</span><span class="p">,</span> <span class="ss">name:</span> <span class="p">{</span><span class="ss">:global</span><span class="p">,</span> <span class="n">name</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now when we want to access our counter, we’ll be able to find it globally regardless of what box we’re connected to.</p>
<p><a href="/assets/images/sgp/connected_nodes.jpg">
<img src="/assets/images/sgp/connected_nodes.jpg" alt="connected nodes with a single counter" />
</a></p>
<p>This solution will work well, at least until we encounter the ever-present specter of distributed systems; the netsplit.</p>
<h2 id="the-netsplit-bogeyman">The netsplit bogeyman</h2>
<p>Netsplit is a catch-all word that probably gets tossed around to much. I know I’ve been guilty of it. Colloquially it’s used to describe any and all faults that you could see in a distributed system. The odds of seeing a netsplit will depend on your cluster size and the reliability of your network. You’re much more likely to see faults in a 60 node cluster running in kubernetes on amazon’s crappy network then if you’re running 2 bare-metal boxes hard-lined into each other sitting in a co-lo somewhere. But if you run a system for long enough, you’ll eventually see faults. When you see those faults, you need to have a plan for handling inconsistent state - even if that plan is “Fuck it who cares.”</p>
<p>Unfortunately, the SGP doesn’t lend itself to graceful recovery after a netsplit. Let’s talk through some of the issues.</p>
<p>If your boxes have a netsplit, at a high level, it means that 1 of 2 things has happened: a node has shut down expectedly or unexpectedly, or the nodes have disconnected from each other but are all still running. The trick is that from a single node’s point of view you can’t really deduce which it was.</p>
<p><a href="/assets/images/sgp/partitioned_nodes.jpg">
<img src="/assets/images/sgp/partitioned_nodes.jpg" alt="partitioned nodes" />
</a></p>
<p>Unless the nodes have a consistent way to talk about cluster membership - Raft, Paxos or similar - all an individual node can <em>really</em> know is “I can no longer talk to these N nodes, and I don’t know why.” This has a secondary effect which happens to make our lives even harder: Deployments and scaling events can start to look identical to netsplits. So while real partitions might be rare, deployments may induce the same failures.</p>
<p>During a partition, the two nodes may not be able to talk. But that doesn’t mean that they aren’t reachable from a client. A client may issue a request, and that request may get load balanced to either node.
If the request happens to land on the node that holds the counter, then we’re OK. But if the request ends up on the node that doesn’t hold the counter, we have problems. As I described above the node can’t know if the counter process is truly gone. The default solution is to assume that the process doesn’t exist and start up a new one. We’re back to having 2 counter processes running on separate nodes again!</p>
<p><a href="/assets/images/sgp/partitioned_nodes_with_counters.jpg">
<img src="/assets/images/sgp/partitioned_nodes_with_counters.jpg" alt="partitioned nodes with counters" />
</a></p>
<p>When the partition heals, we’ll need to reconcile which counter is the canonical one. By default <code class="language-plaintext highlighter-rouge">:global</code> discards one at random. Other registries such as Horde give you more control over this reconciliation process. But you’ll still need to take care in how you reconcile this state.</p>
<h2 id="node-monitors-will-not-save-you">Node monitors will not save you</h2>
<p>One of the ways that we can try to solve this is by using node monitors.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ss">:net_kernel</span><span class="o">.</span><span class="n">monitor_nodes</span><span class="p">(</span><span class="no">true</span><span class="p">,</span> <span class="p">[</span><span class="ss">:nodedown_reason</span><span class="p">])</span>
</code></pre></div></div>
<p>Calling this function in a GenServer will cause node events to be sent to the process as messages. Unfortunately, this isn’t really enough to know the state of your cluster. From any nodes view it’s just not possible to tell if another node has left for good, been autoscaled away, or been disconnected because it couldn’t keep up with health checks.</p>
<p>So how can we solve this? I’m going to focus on three solutions. But there are many others and several variants of each. Hopefully, these will give you some general ideas.</p>
<h2 id="consistent-hashing-and-oracles">Consistent Hashing and Oracles</h2>
<p><a href="https://en.wikipedia.org/wiki/Consistent_hashing">Consistent hashing</a> is my default way to solve this problem and is also the most naive. The basic scheme is that you’ll use a consistent hashing algorithm to decide what node a given counter process lives on (Discord has a <a href="https://github.com/discordapp/ex_hash_ring">robust library for this</a>). One caveat to this approach is that during a netsplit your counter process may not be reachable and thus will be unavailable for the duration of the split.</p>
<p>The other caveat is that you need a way to specify the canonical set of nodes in your cluster. We know that we can’t reliably use node events so we’ll need to solve cluster management some other way.</p>
<p>The most straightforward method is to use static clusters. If you need to add new nodes to your cluster, you bring up an entirely new cluster with the new nodes. Once everything is up you can redirect traffic to the new cluster, and you shut down the old cluster. Obviously, this increases the time it takes to deploy and scale, but if you have reasonable scaling needs, this can work well.</p>
<p>If you need to be able to change the cluster size dynamically, then you’re going to have to do more work to control your deploy process. One way to automate this is to issue a specific <code class="language-plaintext highlighter-rouge">ClusterChange</code> RPC to all nodes in the cluster. If you go with this route, then you need to ensure that you publish RPCs to each node directly instead of relying on the nodes internal distribution. The reason you can’t rely on the nodes to propagate cluster changes is that if you try to change the cluster during a partition, you can end up in a situation where only half of the cluster knows about the change.</p>
<p><a href="/assets/images/sgp/cluster_change_during_partition.jpg">
<img src="/assets/images/sgp/cluster_change_during_partition.jpg" alt="cluster change during partition" />
</a></p>
<p>A third solution is to use an external, consistent store to manage cluster state. Most often, this means using something like ETCD or Zookeeper, which can provide high availability, consistent lookups.</p>
<p>In any of these scenarios autoscaling, at least as we typically think of it, is off the table. You’ll need to invest serious time into your deployment and cluster management to pull this off.</p>
<h2 id="crdts-everywhere">CRDTs everywhere</h2>
<p>Another way to solve the SGP inconsistency problem is to use <a href="https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type">CRDTs</a> everywhere. You’ll still get incorrect data during a netsplit, and when persisting data to a durable store, you’ll have to take care not to overwrite data. To avoid this problem, you’ll need to merge the data in your process with the data in the database and then replace what’s in the database using a CAS operation.</p>
<p>CRDTs have a lot of excellent qualities to them, but you need to ensure you are using them correctly. It’s entirely possible to misuse a CRDT and end up in an inconsistent state. Consistency is not a composable property of software. I also strongly advise you not to build your own CRDTs (other than for fun) and instead use something like LASP in production.</p>
<p>If you do decide to go down the CRDT route, you need to be aware that your states may diverge at some point. Each process and the database will have its own view of the world, and those views may all be different. In the fullness of time, you may converge back to a steady-state. But you might not. This is more common in large clusters where your data has a high rate of change. But this divergence can make it very hard to reason about the state of the world.</p>
<h2 id="just-dont-use-the-sgp">Just don’t use the SGP</h2>
<p>Most of these problems go away if you simply don’t use a single global process to hold your state. This doesn’t mean that you give up on modeling your application as pure functions. Instead, it means giving up on maintaining those data in a long-running process. It’s an easy pattern to reach for and it <em>feels</em> elegant. But you really need to step back and ask yourself why you’re doing this and if you’re ready to solve all of the additional problems this solution will bring. If you only need a cache of values than maybe you’re better off replicating your state to some subset of your nodes instead or building a cache in an ETS table.</p>
<p>If you’re trying to serialize side-effects then maybe you’re better off relying on idempotency and the consistency guarantees of your database. If you like these “event-sourcey” semantics, perhaps you can utilize something like <a href="https://github.com/toniqsystems/maestro">Maestro</a> or <a href="https://www.datomic.com">Datomic</a> to solve the consistency problems for you. There are ways to maintain similar semantics without incurring the same issues.</p>
<h2 id="when-is-an-sgp-ok">When is an SGP ok?</h2>
<p>There are always tradeoffs when building software, and there are times when an SGP is a reasonable solution to the problem at hand. For me, this is when the process is short-lived and will mutate no external state. In fact, the other day, I needed to build a search feature. For the feature to work, we needed to gather data from lots of downstream sources and join it all together. The searching was done as the user typed so rather than do the data fetching every time the search query was changed slightly, I did it once, shoved the state in a process, and then was able to quickly search through everything I had already found with very few additional calls required. This pattern worked quite well because I wasn’t trying to execute mutations from inside the process. If the process didn’t receive a message within 15 seconds, it shut itself back down.</p>
<p>I also think SGPs are fine if you genuinely don’t care about the consistency of the data you’re manipulating. That sorta state isn’t the norm in my experience. But it does exist and modeling it in a process this way is reasonable.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I want to re-emphasize that everything in Saša’s post is excellent and I agree with it. If you can design your problem such that you’re mostly manipulating data with pure functions that generally leads to robust systems. I think the problem is how the community has begun to utilize the ideas in that post and not the post itself. I’m also not trying to say that there’s never any place for an SGP. My goal is to demonstrate that while easy to build and conceptually very elegant, the SGP is one of the more complicated patterns you can add to your system. I personally think its a model that is overused in elixir and I don’t believe it should be the default choice. You need to have a good reason to put your state in a single, unique process. Otherwise, you’re probably better served by relying on a database to help enforce your data consistency.</p>
<p>Most importantly if you <em>do</em> decide that an SGP is a right solution for your use case I want you to be aware of the kinds of problems that you’ll face and the types of solutions that you’ll need to work through. If you’re prepared to do that and its a meaningful use of your companies time then it can work well for you. But you need to have clear-eyes to see what you’re facing.</p>
<p>A huge thanks to José Valim, Lance Halvorsen, Jeff Weiss, Greg Mefford, Neil Menne and others for reviewing this post.</p>Chris KeathleyThere are a few things in the Elixir/Erlang ecosystem that I consider required reading. To spawn, or not to spawn? by Saša Jurić is definitely one of them. If you haven’t read it, you need to. It’ll change the way you think about building elixir applications.Soft deletion with Ecto2019-01-10T04:07:00+00:002019-01-10T04:07:00+00:00http://keathley.github.io/blog/soft-deletion-with-ecto<p>A common need in web applications is to “undo” a deletion event. This is referred
to as a soft-deletion. The record still exists but its hidden from the user.
Soft deleting allows the user to restore that data in the event that they need it in
the future.</p>
<p>Implementing this behaviour is a question that comes up a lot with ecto and I wanted to show my strategy for handling these sorts of situations. We’re going to use
postgres as our database of choice but these concepts should translate pretty
well to other systems.</p>
<h2 id="the-solution">The Solution</h2>
<p>If you want to just see all the code together the full source for these examples
is <a href="https://github.com/keathley/soft_delete">here</a>.</p>
<p>I’m assuming you have a working elixir application with Ecto already setup.
With that assumption in mind lets work on creating some tables. For our
purposes we’re going to allow people to soft-delete widgets from the system. So
we need a widgets table:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">SoftDelete</span><span class="o">.</span><span class="no">Repo</span><span class="o">.</span><span class="no">Migrations</span><span class="o">.</span><span class="no">CreateWidgets</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">Migration</span>
<span class="k">def</span> <span class="n">change</span> <span class="k">do</span>
<span class="n">create</span> <span class="n">table</span><span class="p">(</span><span class="ss">:widgets</span><span class="p">)</span> <span class="k">do</span>
<span class="n">add</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:string</span>
<span class="n">add</span> <span class="ss">:deleted</span><span class="p">,</span> <span class="ss">:boolean</span><span class="p">,</span> <span class="ss">default:</span> <span class="no">false</span>
<span class="k">end</span>
<span class="n">create</span> <span class="n">index</span><span class="p">(</span><span class="ss">:widgets</span><span class="p">,</span> <span class="p">[</span><span class="ss">:deleted</span><span class="p">])</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We’re adding a new table with a <code class="language-plaintext highlighter-rouge">deleted</code> column that defaults to false. The
schema looks similar:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">SoftDelete</span><span class="o">.</span><span class="no">Widget</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">Schema</span>
<span class="kn">import</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">Query</span><span class="p">,</span> <span class="ss">only:</span> <span class="p">[</span><span class="ss">from:</span> <span class="mi">2</span><span class="p">]</span>
<span class="n">schema</span> <span class="s2">"widgets"</span> <span class="k">do</span>
<span class="n">field</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:string</span>
<span class="n">field</span> <span class="ss">:deleted</span><span class="p">,</span> <span class="ss">:boolean</span><span class="p">,</span> <span class="ss">default:</span> <span class="no">false</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>When we want to “delete” a schema a convenient way is to use a dedicated changeset:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">SoftDelete</span><span class="o">.</span><span class="no">Widget</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">mark_for_deletion</span><span class="p">(</span><span class="n">widget</span><span class="p">)</span> <span class="k">do</span>
<span class="n">widget</span>
<span class="o">|></span> <span class="n">change</span><span class="p">(%{</span><span class="ss">deleted:</span> <span class="no">true</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Which can be used like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>iex(1)> w = SoftDelete.Repo.get(SoftDelete.Widget, 1)
16:52:46.496 [debug] QUERY OK source="widgets" db=1.1ms decode=1.2ms queue=1.2ms
SELECT w0."id", w0."name", w0."deleted" FROM "widgets" AS w0 WHERE (w0."id" = $1) [1]
%SoftDelete.Widget{
__meta__: #Ecto.Schema.Metadata<:loaded, "widgets">,
deleted: false,
id: 1,
name: "foo"
}
iex(2)> w |> SoftDelete.Widget.mark_for_deletion() |> SoftDelete.Repo.update
16:53:11.386 [debug] QUERY OK db=1.7ms queue=0.6ms
UPDATE "widgets" SET "deleted" = $1 WHERE "id" = $2 [true, 1]
{:ok,
%SoftDelete.Widget{
__meta__: #Ecto.Schema.Metadata<:loaded, "widgets">,
deleted: true,
id: 1,
name: "foo"
}}
</code></pre></div></div>
<p>Now that we can mark widgets as deleted we need some way to scope our queries
as well. A good first approach is to use queries:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">SoftDelete</span><span class="o">.</span><span class="no">Widget</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">alive</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="k">do</span>
<span class="n">from</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">query</span><span class="p">,</span>
<span class="ss">where:</span> <span class="n">w</span><span class="o">.</span><span class="n">deleted</span> <span class="o">==</span> <span class="no">false</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now we can compose this function with other queries in order to only fetch
“alive” widgets from the database:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iex</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">></span> <span class="no">Widget</span> <span class="o">|></span> <span class="no">Widget</span><span class="o">.</span><span class="n">alive</span> <span class="o">|></span> <span class="no">Repo</span><span class="o">.</span><span class="n">all</span>
<span class="mi">17</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">06.985</span> <span class="p">[</span><span class="n">debug</span><span class="p">]</span> <span class="no">QUERY</span> <span class="no">OK</span> <span class="n">source</span><span class="o">=</span><span class="s2">"widgets"</span> <span class="n">db</span><span class="o">=</span><span class="mi">1</span><span class="o">.</span><span class="err">6</span><span class="n">ms</span> <span class="n">decode</span><span class="o">=</span><span class="mi">2</span><span class="o">.</span><span class="err">0</span><span class="n">ms</span> <span class="n">queue</span><span class="o">=</span><span class="mi">0</span><span class="o">.</span><span class="err">8</span><span class="n">ms</span>
<span class="no">SELECT</span> <span class="n">w0</span><span class="o">.</span><span class="s2">"id"</span><span class="p">,</span> <span class="n">w0</span><span class="o">.</span><span class="s2">"name"</span><span class="p">,</span> <span class="n">w0</span><span class="o">.</span><span class="s2">"deleted"</span> <span class="no">FROM</span> <span class="s2">"widgets"</span> <span class="no">AS</span> <span class="n">w0</span> <span class="no">WHERE</span> <span class="p">(</span><span class="n">w0</span><span class="o">.</span><span class="s2">"deleted"</span> <span class="o">=</span> <span class="no">FALSE</span><span class="p">)</span> <span class="p">[]</span>
<span class="p">[</span>
<span class="p">%</span><span class="no">SoftDelete</span><span class="o">.</span><span class="no">Widget</span><span class="p">{</span>
<span class="ss">__meta__:</span> <span class="c1">#Ecto.Schema.Metadata<:loaded, "widgets">,</span>
<span class="ss">deleted:</span> <span class="no">false</span><span class="p">,</span>
<span class="ss">id:</span> <span class="mi">2</span><span class="p">,</span>
<span class="ss">name:</span> <span class="s2">"bar"</span>
<span class="p">},</span>
<span class="p">%</span><span class="no">SoftDelete</span><span class="o">.</span><span class="no">Widget</span><span class="p">{</span>
<span class="ss">__meta__:</span> <span class="c1">#Ecto.Schema.Metadata<:loaded, "widgets">,</span>
<span class="ss">deleted:</span> <span class="no">false</span><span class="p">,</span>
<span class="ss">id:</span> <span class="mi">3</span><span class="p">,</span>
<span class="ss">name:</span> <span class="s2">"foo"</span>
<span class="p">},</span>
<span class="p">%</span><span class="no">SoftDelete</span><span class="o">.</span><span class="no">Widget</span><span class="p">{</span>
<span class="ss">__meta__:</span> <span class="c1">#Ecto.Schema.Metadata<:loaded, "widgets">,</span>
<span class="ss">deleted:</span> <span class="no">false</span><span class="p">,</span>
<span class="ss">id:</span> <span class="mi">4</span><span class="p">,</span>
<span class="ss">name:</span> <span class="s2">"bar"</span>
<span class="p">}</span>
<span class="p">]</span>
</code></pre></div></div>
<p>Perfect! We can now soft-delete widgets.</p>
<h2 id="an-improvement">An improvement</h2>
<p>It can be tedious to include a function every time we need to fetch widgets from
the database. Arguably it shouldn’t take additional steps to do the thing we always
want to do. Plus if you’re using ecto associations then query composition can become…
tricky. While there are a few ways to improve this situation my personal
preference is to use views.</p>
<p>In order to put our view solution into place we’ll need a new migration and a new
schema:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">SoftDelete</span><span class="o">.</span><span class="no">Repo</span><span class="o">.</span><span class="no">Migrations</span><span class="o">.</span><span class="no">CreateAliveWidgets</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">Migration</span>
<span class="nv">@up</span> <span class="s2">"CREATE VIEW alive_widgets AS select id, name from widgets where not deleted;"</span>
<span class="nv">@down</span> <span class="s2">"DROP VIEW IF EXISTS alive_widgets;"</span>
<span class="k">def</span> <span class="n">change</span> <span class="k">do</span>
<span class="n">execute</span><span class="p">(</span><span class="nv">@up</span><span class="p">,</span> <span class="nv">@down</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">defmodule</span> <span class="no">SoftDelete</span><span class="o">.</span><span class="no">AliveWidget</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Ecto</span><span class="o">.</span><span class="no">Schema</span>
<span class="n">schema</span> <span class="s2">"alive_widgets"</span> <span class="k">do</span>
<span class="n">field</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:string</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The view itself is easy to manage and update. Admittedly we’re potentially taking
a performance hit here. In that case it would probably be better to use a
materialized view. But that involves setting up triggers and some other concepts.
Our basic solution should work well for most cases. If you have so much data
that this is causing you performance problems then you probably already
understand your use case and I’m sure you can re-implement this with
materialized views.</p>
<p>Huzzah for building your own solutions.</p>
<p>Now when we need to select our widgets we can use the <code class="language-plaintext highlighter-rouge">AliveWidget</code> schema.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iex</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span><span class="o">></span> <span class="no">Repo</span><span class="o">.</span><span class="n">all</span><span class="p">(</span><span class="no">AliveWidget</span><span class="p">)</span>
<span class="p">[</span>
<span class="p">%</span><span class="no">SoftDelete</span><span class="o">.</span><span class="no">AliveWidget</span><span class="p">{</span>
<span class="ss">__meta__:</span> <span class="c1">#Ecto.Schema.Metadata<:loaded, "alive_widgets">,</span>
<span class="ss">id:</span> <span class="mi">2</span><span class="p">,</span>
<span class="ss">name:</span> <span class="s2">"bar"</span>
<span class="p">},</span>
<span class="p">%</span><span class="no">SoftDelete</span><span class="o">.</span><span class="no">AliveWidget</span><span class="p">{</span>
<span class="ss">__meta__:</span> <span class="c1">#Ecto.Schema.Metadata<:loaded, "alive_widgets">,</span>
<span class="ss">id:</span> <span class="mi">3</span><span class="p">,</span>
<span class="ss">name:</span> <span class="s2">"foo"</span>
<span class="p">},</span>
<span class="p">%</span><span class="no">SoftDelete</span><span class="o">.</span><span class="no">AliveWidget</span><span class="p">{</span>
<span class="ss">__meta__:</span> <span class="c1">#Ecto.Schema.Metadata<:loaded, "alive_widgets">,</span>
<span class="ss">id:</span> <span class="mi">4</span><span class="p">,</span>
<span class="ss">name:</span> <span class="s2">"bar"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="mi">17</span><span class="p">:</span><span class="mi">09</span><span class="p">:</span><span class="mf">50.343</span> <span class="p">[</span><span class="n">debug</span><span class="p">]</span> <span class="no">QUERY</span> <span class="no">OK</span> <span class="n">source</span><span class="o">=</span><span class="s2">"alive_widgets"</span> <span class="n">db</span><span class="o">=</span><span class="mi">1</span><span class="o">.</span><span class="err">4</span><span class="n">ms</span> <span class="n">queue</span><span class="o">=</span><span class="mi">2</span><span class="o">.</span><span class="err">2</span><span class="n">ms</span>
<span class="no">SELECT</span> <span class="n">a0</span><span class="o">.</span><span class="s2">"id"</span><span class="p">,</span> <span class="n">a0</span><span class="o">.</span><span class="s2">"name"</span> <span class="no">FROM</span> <span class="s2">"alive_widgets"</span> <span class="no">AS</span> <span class="n">a0</span> <span class="p">[]</span>
</code></pre></div></div>
<p>This provides us a nice read only interface and allows us to easily compose our schema
with the rest of ecto.</p>
<h2 id="potential-improvements">Potential Improvements</h2>
<p>There are a few other improvements we could make to this solution. As I mentioned
above, if you’re craving read performance and have a relatively low number of
writes (and you’re on postgres) you could look into materialized views.</p>
<p>Another improvement would be to substitute the boolean column for a timestamp.
Many people opt to do this because it provides an easy way to ask, “When was
this deleted”. A general implementation would be to allow the column to be null
and add a timestamp when the widget is deleted. Our view would need to change
to select only rows that have a null <code class="language-plaintext highlighter-rouge">deleted_at</code> column. I leave this as an
exercise for the reader.</p>
<p>More then anything I hope that this post shows how to think through these kinds
of problems and encourages you to build these solutions on your own instead of
reaching for a library. Libraries certainly have their place. But when it comes
to data management I’m always inclined to do it on my own.</p>Chris KeathleyA common need in web applications is to “undo” a deletion event. This is referred to as a soft-deletion. The record still exists but its hidden from the user. Soft deleting allows the user to restore that data in the event that they need it in the future.