Turning Business Logic Errors into Compile-time Errors

Tyler Langlois

Turning Business Logic Problems into Compile-time Errors

One of the most useful principles that I've adopted after working in languages with powerful type systems is modeling data types with purposeful intent to write better, more reliable software.

Bugs can manifest in many forms, but some are more shallow than others. It’s easier to find problems that the compiler understands (such as syntax errors), so the more application logic we can convince the compiler to check for us, the better. Its code review is much more rigorous than that of any individual engineer! Today I want to talk about how one language feature in particular has become instrumental to ensuring our software operates reliably by offloading high-level business logic errors as compile-time errors using Rust's type system.

Most of the ideas here aren't new — experienced developers are likely familiar with many of the tactics I'll describe. With that said, I certainly wish I had picked up these strategies earlier in my own career!

A Tale of Two Services

To illustrate these concepts, we'll take an example from our own software stack.

We bundle our server-side software into a network appliance that we call a Controller: a virtual machine or container that includes additional services like a backup daemon or reverse proxy. In the case of the reverse proxy, it's there not only to serve as a front-facing entry point to our own APIs and static assets, but as a defense-in-depth measure to ensure that access to the Controller's API is… controlled. This lets us focus on our own APIs and services and delegates other capabilities to software purpose-built and optimized for those specialized functions (like serving static assets with a reverse proxy).

The coupling point between our server daemon and the reverse proxy is a REST API. As with many systems, the point of integration between one system and another is a prime place for mistakes to happen: not only do we need to ensure that user preferences are kept synchronized with an external service, but drifts in state require distinctly different actions, such as removing an extant setting, changing ones that differ, and so on. We do these things not because they are easy, but because we thought they were going to be easy.

Note: Caddy is our reverse proxy layer and includes an administrative API. It’s a great tool, especially for our purpose, since it ships with modern defaults and can make runtime configuration changes on-the-fly without the need to restart and disrupt user sessions. Many of the examples here will draw from its configuration norms.

Let's take one specific case of this point of interaction: IP address filtering. At the presentation layer in the Controller's web application, users may define one or more IP addresses or CIDR ranges that the web application will block access from. The actual blocking mechanism occurs in the reverse proxy, so we'll need to propagate those changes through to its REST API from our own server process.

Sketching the Problem

At either end of this system, you can imagine how user settings might be modeled:

In our application, we store a set of IP addresses and CIDR ranges that should be blocked.
In the reverse proxy, we adhere to the application's notion of how to detect and block specific addresses (in Caddy's case, this is a combination of a matcher against a client IP that results in a static response).

Our high-level requirement is to keep these preferences synchronized, and our low-level task is to implement a solution.

I think many software developers could envision what you might write here: get the desired value and the current value, compare them, and take the appropriate action, whether that may be a POST request, a DELETE, or something else. There are a few nuances to bear in mind with this particular task:

We don't want to simply overwrite or clobber some settings that may be adjacent to other, related settings. For example, we want to avoid erasing any reverse proxy directives that do the actual proxying to our backend application when blocking IP ranges.
Caddy often (helpfully) enforces the accepted HTTP verb depending on the request destination. For example, you may want to send a PUT which will only create new API objects and avoid overwriting existing objects.
Our internal configuration representation may differ significantly from the reverse proxy's representation. Comparing the two needs to occur in a stable way that correctly asserts equality and doesn't induce flapping (repeatedly mutating a value accidentally).

Those requirements are table stakes for a reliable system, but we'd like to take it one step further: can we model the problem to glean compile-time assurances as well? It’s one thing to make the solution work, but we can lock in even better stability guarantees if the language itself can check our work and catch logical bugs for us.

Let's find out!

A Rust Type Primer

Before getting into the details, let's quickly review some of the fundamental Rust types that our solution leverages.

The Option<T> type is a polymorphic type that may or may not hold some other type, here denoted by T. In code you'll create values of type Option<T> with one of two constructors, either None or Some(T) — the former when no inner value is stored, and the latter when you do have an inner value to work with. For example, we might express a desired set of addresses to block as a custom type called BlockPreferences and store the user preference internally as Option<BlockPreferences>: either a user has addresses they'd like us to block (the Some(BlockPreferences) case), or not (the None case).

(Note: keen observers might recognize that you could model this even more accurately as "a collection with zero members" versus "a collection with one or more members". We're going with Option<T> to keep it simple here, but there’s more than one way to model this problem.)

Inside of BlockPreferences we can define a few fields, like the collection of addresses to block and some other customizable options like the HTTP status code to respond with.

struct BlockPreferences {
    addresses: HashSet<IpCidr>,
    code: u16,
}

In most cases, the Rust array type Vec<T> is sufficient to express a collection of elements. However, we're choosing an unordered HashSet<T> for two reasons: when the rubber meets the road, the order we express blocked addresses in Caddy doesn't really matter, and comparing two HashSet<T> values is order-agnostic. This helps us avoid making repeated REST calls and potentially causing flapping at the reverse proxy layer if, for some reason, a user alters the order of blocked addresses without actually changing the overall list. We only care about the equality of overall content and not the order.

Similarly, we could model Caddy's API responses as their own types (and indeed, we do this in our codebase). As a small example, this is an abbreviated Caddy route that will respond to any requests from the remote IP 1.1.1.1 with a static 404:

{
    "apps": {
        "http": {
            "servers": {
                "srv0": {
                    "listen": [
                        ":80"
                    ],
                    "routes": [
                        {
                            "handle": [
                                {
                                    "handler": "static_response",
                                    "status_code": 404
                                }
                            ],
                            "match": [
                                {
                                    "remote_ip": {
                                        "ranges": [
                                            "1.1.1.1"
                                        ]
                                    }
                                }
                            ]
                        }
                    ]
                }
            }
        }
    }
}

We model each of these JSON fields as structs with Rust counterparts — for example, the ranges field can be modeled as a HashSet<IpCidr> (the field will accept either singular IP addresses or CIDR ranges and we use HashSet<T> for the same reason as we do for BlockPreferences). We wrap these more basic types in higher-level types like CaddyServerRoute or CaddyMatch so that a GET against http://127.0.0.1:2019/config/apps/http/servers/bowtie/routes can be directly deserialized into a Vec<CaddyServerRoute> value (note that the API response is an array of CaddyServerRoute values, like a collection of routes in a routing table).

At this point, we should call out the incredible work that the serde project has done to make serializing to and from Rust types and formats like JSON easier. In most cases, a simple #[derive(Deserialize, Serialize)] is all you need to translate between native types and their JSON representation.

Great! We can model user settings inside our application with Option<BlockPreferences> and the reverse proxy's routes with Vec<CaddyServerRoute>. The next step is to unify them to ensure any desired settings are configured correctly.

A Match Made in the Compiler

Modeling our types concisely and accurately unlocks some useful tools in Rust's type system. In particular, reconciling values that differ between our desired settings versus actively configured settings is a great candidate for exhaustive pattern matching.

Pattern matching is perhaps most easily illustrated by providing some concrete Rust code. If we're working with a variable named blocked_addresses of type Option<BlockPreferences>, then we can destructure its concrete value with the match keyword:

match blocked_addresses {
    None => println!("User has not defined any addresses to block."),
}

We can take action based upon whether the value has some inner content or not (in this case, None means the user has not defined any addresses they wish to block).

Exhaustive pattern matching means that the compiler will actively ensure that we have every possible case covered. In the previous example, our code won't take action if the user has defined blocked addresses — a Some(addresses). This means that we get a compile-time error if we forget to handle the case. In a simple example like this, the principle seems useful, although not tremendously significant — but when you start stacking up layers of abstraction and composing different pieces of code, the ability to rely on predictable behaviors scales in useful ways.

For example, we can compose our blocking preferences with the reverse proxy configuration into a tuple and then pattern match on that:

match (blocked_addresses, proxy_routes.is_empty()) {
    (Some(preferences), true) => println!("We have addresses to block and none are configured"),
    // Compiler will remind you about "address to block with an existing configuration", etc.
}

You can take this as far as you want, and the compiler will dutifully remind you if there's some combination of patterns you're not handling. This is also a useful way to plan your work: match against the state of your values and address each possible combination that the compiler enumerates. Moreover, we aren't limited to destructuring native types like Option<T> only: our own structs, fields, and enums are all fair game for a match.

Pattern Match Soup

Let's bring together all of the concepts we've covered so far: modeling our types, matching against their concrete values, and exhaustive pattern matching.

We draw user preferences for blocked addresses from the database and call it blocked_addresses. We issue a GET against Caddy's administrative API and call the value we retrieve caddy_route.

Then we construct an almost-complete (for the sake of illustration) match statement:

match (blocked_addresses, caddy_route) {
    // `ours` is an unwrapped BlockPreferences value.
    (Some(ours), None) => { todo!("We need to install block rules"); },
    // `theirs` is an unwrapped `CaddyServerRoute` value from the proxy.
    (None, Some(theirs)) => { todo!("We need to remove existing rules"); },
    // ...and so on
}

The trailing comment alludes to the best part: the compiler will hold our hand to enumerate each additional case that we need to handle. We've reached the point where high-level application logic — for example, that a user desires blocked IP ranges but we haven't configured them yet — is a compile-time check.

Hint: if you're using rust-analyzer (Rust's standard tool for language server features), you can even start writing the match like this:

match (blocked_addresses, caddy_route) { }

Invoking a code action request on the match keyword will offer to fill the arms of the block, which means that each possible value of the tuple (like (Some(_), None) and so on) will be automatically filled for you. It's a very useful feature!

Comparing Different Types

We can write a little more code to make the operation that reconciles user preferences against the reverse proxy configuration easier. In the previous section, we can arrive at the following piece of code further down our match arms:

   (Some(ours), Some(theirs)) => { todo!("check the proxy's value"); },

This is fine, but these types are ours: BlockPreferences and theirs: CaddyServerRoute. We know that both are well-formed after being deserialized, but we can't directly compare them to determine whether we should make REST calls to change the value in Caddy — they’re fundamentally different types.

Consider a simplified example of how we might represent the web filtering preferences as an application value versus how it may be represented in Caddy:

/// Note: Caddy names some JSON fields `match`, so we need to rely on
/// a serde feature to rename it in order to avoid conflicts with the
/// native keyword.
///
/// The `derive` of `PartialEq` here is important as it lets us test equality between
/// two CaddyServerRoute values.
#[derive(PartialEq, Serialize, Deserialize)]
struct CaddyServerRoute {
    handle: Option<Vec<CaddyHandle>>,
    #[serde(rename = "match")]
    matchers: Option<Vec<CaddyMatch>>,
}

/// Handlers always indicate their kind with the `handler` field, but
/// may have differing fields based upon what kind of handler they
/// are.
#[derive(PartialEq, Serialize, Deserialize)]
struct CaddyHandle {
    handler: String,
    status_code: Option<u16>,
}

#[derive(PartialEq, Serialize, Deserialize)]
struct CaddyMatch {
    remote_ip: Option<CaddyClientIps>,
}

#[derive(PartialEq, Serialize, Deserialize)]
struct CaddyClientIps {
    ranges: HashSet<IpCidr>,
}

#[derive(PartialEq, Serialize, Deserialize)]
struct BlockPreferences {
    addresses: HashSet<IpCidr>,
    code: u16,
}

impl From<BlockPreferences> for CaddyServerRoute {
    fn from(b: BlockPreferences) -> Self {
        Self {
            handle: Some(vec![CaddyHandle {
                handler: "static_response".to_string(),
                status_code: Some(b.code),
            }]),
            matchers: Some(vec![CaddyMatch {
                client_ip: Some(CaddyClientIps {
                    ranges: b.addresses,
                }),
            }]),
        }
    }
}

The Rust From trait is a standard way to convert values between different types. In the code above, we've implemented From for CaddyServerRoute to let us turn internal user preferences into the equivalent form that Caddy understands.

Along with the #[derive(PartialEq)] directly above CaddyServerRoute, this rewards us with a few tools when we convert our address blocking preferences into a CaddyServerRoute:

We can compare values to assert whether installed blocking rules are what we expect them to be, and
we can easily prepare our payload to load into Caddy by converting it into the right type before initiating a PUT or POST.

Armed with this new capability, we can turn our pattern match into:

    // The new `if` predicate uses our `From` implementation:
    (Some(ours), Some(theirs)) if CaddyServerRoute::from(ours) != theirs => {
       // To complete the example, this is what a write to the API
       // might look like using the `reqwest` library, with the `caddy_api`
       // variable being of type `Url`.
	reqwest::Client::new()
	    .post(caddy_api)
           // Once again we can leverage our `impl From`.
	    .json(&CaddyServerRoute::from(ours))
	    .send()
	    .await?
	    .error_for_status()?;
    },
    (Some(_), Some(_)) => { info!("blocking rules are up-to-date"); },

Recall that with this relatively short piece of code, we definitively know that:

The Caddy REST response is well-formed and parsed into a known value,
we can’t make inaccurate comparisons if we ever change our user preference type down the road because the compiler will ensure our implementation of the From trait is up-to-date,
the compiler is asserting that we've handled the high-level business logic of "a user has asked us to block addresses, Caddy has blocked addresses defined, but they don't match", and
every other combination we fill into the arms of our match will get checked to ensure every case is handled.

We did it! Pat yourself on the back and rest easy knowing that we've eliminated several categories of incorrect or unhandled behavior from our code.

Summary

We hope that illustrating these concepts with concrete examples can both highlight their utility for your own projects and serve as one more testament to how useful Rust abstractions are in the real world. Ideas like exhaustive pattern matching aren't new ideas, but coupled with other powerful tools in the ecosystem like serde and features for keywords like match, they combine to form high-level and ergonomic tools.

…and if you're in the market for a SASE solution and you like the idea of building atop reliable, correct-by-construction code and services, consider giving Bowtie a look.

Additional reading:

See Bowtie In Action

Experience Bowtie's distributed overlay security platform in action. Book a demo to see how we can improve your network's security.

Get a Demo