Type theory for named arguments

One of the big challenges in maintaining open source software, or service-oriented systems, or indeed any program where a function and its callers can evolve independently and be maintained by different sets of people, is that of maintaining compatibility. When we ship a new release of a package, we usually want users to be able to upgrade without their programs breaking as a result. This is especially tricky when the language we’re using does not have a direct representation of a computational concept we’re using, as it leaves room for a function’s author and its users to have different interpretations of what the function does and how it should be used.

I notice this quite frequently in my JavaScript work. As I’ve noted in Breaking Changes, JavaScript’s Object type is overloaded with many different uses. It’s most frequently used to represent records: compound values with a fixed set of predefined fields with distinct types, where the program accesses specific fields by name. But it also serves as a map type: an open-ended collection of key-value pairs where the program doesn’t expect any particular keys to exist and accesses all keys dynamically, often by iteration.

There are grey areas and overlaps and other uses that Object is put to, and in order to make backward-compatible changes to programs we need to be aware of which of these uses we’re dealing with anywhere an Object is used. In this article we’ll look at a very common use of the JavaScript Object type whose behaviour is distinct from both records and maps: named arguments. Whereas records are a widely covered concept in type theory, and find many applications across programming languages in the wild, named arguments are less consistent in their implementation and use, and it would help us to have a well-defined conception of what named arguments actually are, what purpose they serve, and how they ought to work.

Before we develop a theory, it will be helpful to look at an example from a language with named arguments: Python. Most mainstream languages use positional arguments; the arguments to a function are bound to its parameters in the order they appear. In Python, all function parameters can also be addressed by name. Take this example of a function that generates a URL string:

def url(host, port, path):
    return 'http://%s:%d%s' % (host, port, path)

We can call this function with positional arguments, that is with the arguments given in the order of the parameters they correspond to:

url('example.com', 80, '/')
# -> 'http://example.com:80/'

But we can also call the function with named arguments, where instead of giving the arguments in a fixed order we give the name of the parameter each should be assigned to:

url(path='/', host='example.com', port=80)
# -> 'http://example.com:80/'

We can even mix these schemes, giving some arguments positionally and some with names:

url('example.com', path='/', port=80)
# -> 'http://example.com:80/'

In this way, a function’s parameter names are part of its public interface, and changing them can break any existing caller. This is a kind of coupling not present in many languages and it might look potentially brittle, but named arguments also give us a lot of flexibility when it comes to maintaining compatibility.

We should define what we mean by compatibility. Here I am specifically talking about backward compatibility, which is when a new version of a package works with all existing callers of a previous version. Version B of a package is backward compatible with version A if any program that ran successfully using version A would also run successfully using version B. If there are programs that would have worked with version A but will not work with B, then B contains breaking changes, making it harder to upgrade from A to B. This is what we’re trying to avoid. To determine this, we need to consider all the valid ways of using version A and determine whether they still work with version B.

Above we showed a version of the url() function and some valid uses of it: 1. positional arguments, 2. named arguments, and 3. mixed:

def url(host, port, path):
    return 'http://%s:%d%s' % (host, port, path)

url('example.com', 80, '/')                 # 1
url(path='/', host='example.com', port=80)  # 2
url('example.com', path='/', port=80)       # 3

Imagine we’re writing version B of this function and want to add a new parameter to change the URL scheme, so users can build URLs with https:, ws:, and so on. Suppose we do this by adding a new parameter to the beginning of the list:

def url(scheme, host, port, path):
    return '%s//%s:%d%s' % (scheme, host, port, path)

Cases 1 and 3 are now clearly broken; none of the positional arguments will bind to the correct parameters. Case 2 is less drastically broken; the named host, port and path arguments will still bind correctly, but we’re not providing a scheme argument which the function needs. Python throws an error in all 3 cases here.

We fix this by putting scheme at the end of the parameter list and giving it a default value:

def url(host, port, path, scheme='http:'):
    return '%s//%s:%d%s' % (scheme, host, port, path)

Case 1, 2 and 3 are now all valid calls, because scheme is optional and none of the existing parameters have changed their positions or names. Callers that upgrade to version B can begin using the scheme parameter either positionally or by name:

url('example.com', 443, '/', 'https:')
# -> 'https://example.com:443/'

url(scheme='ws:', path='/', host='example.com', port=80)
# -> 'ws://example.com:80/'

From these examples we see that named arguments have a couple of main uses:

  • They provide a better interface when a function has lots of parameters whose order is hard to remember, or is likely to be unstable
  • They allow some parameters to be omitted by callers by making them optional and giving them default values

Both of these can be used as strategies for achieving backward compatibility, because they make it easier to add new parameters to functions without breaking existing callers. However, it is important to know which of these reasons – whether you want arguments to be unordered or optional – you’re appealing to when using named arguments, and it affects how we should think about their types.

Note that Python considers it an error to pass an argument with a name the target function does not define:

url(path='/', host='example.com', port=80, extra=True)
# -> TypeError: url() got an unexpected keyword argument 'extra'

This is also an important detail when we consider backward compatibility. We’ll return to this idea later, but for now have a think about how rejecting unrecognised named arguments might affect what kinds of changes we’re able to make without breaking the behaviour of existing callers.

Now, let’s consider a language without named arguments. JavaScript does not have this feature, but we often use object literals to simulate it. Here’s our original URL-building function rewritten in JavaScript:

function url(options) {
  let { host, port, path } = options
  return `http://${ host }:${ port }${ path }`
}

If you prefer, the destructuring can be done inside the parameter list to make it look even closer to a language with genuine named arguments.

function url({ host, port, path }) {
  return `http://${ host }:${ port }${ path }`
}

I’ll be using the first version as I will need to refer to this function’s parameter by name frequently.

The url() function has a single parameter named options. Within the body of url() we access the host, port and path fields on this parameter, so to get a meaningful result from this function we must pass an object with those fields as an argument:

url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'

This is how most JavaScript code simulates named arguments. Let’s see how this function responds when called with incorrect input. If we call url() without some of these required fields, we get a strange result. We don’t get an error in the sense that the program does run without crashing, but producing a URL with undefined where its path should be was probably not what we intended. This will manifest as an error somewhere else in the program, for example when we try to make a request with this URL.

url({ host: 'example.com', port: 80 })
// -> 'http://example.com:80undefined'

Conversely, if we pass an object with fields that url() does not access, it simply ignores them.

url({ host: 'example.com', port: 80, path: '/', scheme: 'ws:' })
// -> 'http://example.com:80/'

A key question in thinking about types in relation to named arguments is: should this be considered an error, or is it expected behaviour? Likewise, should omitting a field be an error, or should the program be able to handle this case? To answer these questions, we need to look at what named arguments are for – how they are used, what good user feedback looks like, and how they affect how an API changes over time.

We’ve already seen how Python implements named arguments: an argument must be given for all non-optional parameters, optional parameters can be omitted when calling a function, but additional unrecognised arguments trigger an error. In JavaScript none of our examples trigger an error in the sense of throwing an exception, and that’s partly explained by JavaScript being designed to be more permissive: it will often let an expression evaluate to undefined rather than throwing an exception. This is still a program error in the sense the program does not do what we want, it’s just not recognised formally by the language and so it’s easier for errors to happen silently and only be noticed by a thorough test suite or picked up by end users in production.

To get a more complete explanation for this behaviour we need to remember that Object in JavaScript plays a lot of different roles, and it wasn’t designed to represent named arguments specifically. We’re just abusing its syntax to do something that resembles named arguments, but doesn’t actually work in the same way. The behaviour above is better explained by noting that JavaScript primarily thinks of objects as records, and then considering how subtyping works for records.

In type theory, records are a compound structure that consist of a set of named fields and their corresponding types. So given simple types like number and string – atomic values with no internal structure – a record type would be something like { host: string, port: number }. Records are key building block of type theory because they let us talk about arbitrary structures, not just single values, in a way that maps onto features in most programming languages.

Let’s revisit our url() function from above:

function url(options) {
  let { host, port, path } = options
  return `http://${ host }:${ port }${ path }`
}

What can we say about the type of the options parameter here? We can see the function body accesses the host, port and path fields from it, and all these are then interpolated into a string, so they can actually be any type that converts to a string. Let’s say we ascribe options the following type to represent these requirements, giving some more specific field types that match our expected usage. The notation x: T means the parameter x has type T.

options: { host: string, port: number, path: string }

Here we inferred the type of options based on how it’s used in the body of url(). Similarly, we can infer the types of argument expressions from the syntax used to generate them.

url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'

Here, the argument expression is an object literal with three fields: host, which maps to a string, port which maps to a number, and path which maps to a string. So this argument expression has the type:

{ host: string, port: number, path: string }

This is exactly the type required by url(), so this call works. Now let’s examine the example calls that didn’t work:

url({ host: 'example.com', port: 80 })
// -> 'http://example.com:80undefined'

Here the argument’s type is { host: string, port: number } – it’s missing the path field. url() doesn’t throw an exception here but it does do something we’d consider erroneous, so we can infer that this argument type is not compatible with the type of the options parameter in url(). What about the case where we provide additional fields?

// argument: { host: string, port: number, path: string, scheme: string }

url({ host: 'example.com', port: 80, path: '/', scheme: 'ws:' })
// -> 'http://example.com:80/'

Here the url() function behaves normally, insofar as it uses the fields it knows about correctly. So we can say this argument type is compatible with the url() function.

This is an example of an argument’s type not being identical to that required by a function’s parameters, but the call still being valid. In type theory this notion of compatibility is captured by subtyping: a type S is a subtype of type T if, anywhere the type T is required, an expression of type S can be used. The notation “S <: T” is used to mean “S is a subtype of T”. In our examples above, we see that { host: string, port: number, path: string, scheme: string } is a subtype of { host: string, port: number, path: string } – the former can be used where the latter is required. However { host: string, port: number } is not a subtype as its use leads to erroneous behaviour.

In general a record type S is a subtype of another record type T, if:

  1. every field name that appears in T also appears in S, and
  2. for every field k that appears in T, S[k] <: T[k].

That is, S has at least all the same fields as T and possibly more, and all those fields’ types in S are compatible with those in T.

Why is this definition of subtyping for records useful? It has to do with our need to use polymorphism in our programs: to write reusable functions that work with many different types of objects, so long as those objects have certain properties. Our url() function can work on any object that possesses the fields host, port and path, for example an instance of a class that defines these will work:

class Request {
  constructor(host, port, path) {
    this.host = host
    this.port = port
    this.path = path
  }
}

url(new Request('example.com', 80, '/'))
// -> 'http://example.com:80/'

But making url() accept only objects with those exact fields might be overly restrictive, for example it would prevent us calling it with objects with additional properties or methods:

import querystring from 'querystring'

class QueryRequest {
  constructor(host, port, path, params) {
    this.host = host
    this.port = port
    this.path = path
    this.params = params
  }

  getQueryString() {
    return querystring.encode(this.params)
  }
}

url(new QueryRequest('example.com', 80, '/', { q: 'hello' }))
// -> 'http://example.com:80/'

An instance of QueryRequest has two fields that url() does not use: params and getQueryString (in JavaScript, a method is just a property whose value is a function). url() uses the fields it cares about and ignores the presence of everything else. We can say that Request and QueryRequest are both subtypes of the url() parameter { host: string, port: number, path: string }, because they possess the required fields and so are acceptable as input to the url() function. Note that this notion of subtyping is distinct from class inheritance; QueryRequest is not a subclass of Request but both are subtypes of the anonymous type { host: string, ... } that url() takes.

A class that does not implement the required interface should not be acceptable – the following code should be considered an error:

class City {
  constructor(name) {
    this.name = name
  }
}

url(new City('San Francisco'))

This mechanism of functions using only the parts of an object relevant to them, and ignoring the object’s other aspects, is what enables the bulk of polymorphism in object-oriented languages. Requiring argument types to exactly match the types of function parameters would be too restrictive and would require many functions to be written anew for every new class they should work on. Therefore we allow argument types to be subtypes of the parameter types, under the rules for record subtyping given above.

What types of changes do these subtyping rules allow? Imagine that url() is part of an open source package installed in many applications. Let’s think about how we could change it and how that would affect any existing caller. First, we could remove one of the required fields, say port:

function url(options) {
  let { host, path } = options
  return `http://${ host }:80${ path }`
}

This function now makes fewer demands of its callers – whereas previously it required the host, port and path fields, it now only requires host and path. Any caller that fulfilled the old requirements certainly fulfills the new ones and will therefore keep running fine. In general, replacing a parameter type with one that’s less specific – a supertype of the old type – is safe.

Now, imagine we add a new required field, scheme:

function url(options) {
  let { host, port, path, scheme } = options
  return `${ scheme }//${ host }:${ port }${ path }`
}

This function makes more demands of its callers; we can’t assume that any existing caller passes an argument with a scheme field, so this could break existing callers. By making the parameter type more specific, replacing it with a subtype, we make previously valid calls invalid. Only calls that conform to this new type already will still work if this new version of url() is installed, and all others will break, silently, by inserting undefined into the URLs they generate.

Subtyping also gives us rules for what changes we are able to make while preserving compatibility. Replacing a function parameter type with a supertype is safe, replacing with a subtype is not. Our type theory for named arguments should reflect what kinds of changes are safe.

So what should the subtyping rules for named arguments be? What kinds of changes should be considered compatible or not? Normally, we want to be able to add new options to a library function without breaking existing callers. Indeed, as we saw in the Python example, making new fields optional by giving them default values makes this possible. So how about adding scheme as an option to url():

function url(options) {
  let { host, port, path, scheme = 'http:' } = options
  return `${ scheme }//${ host }:${ port }${ path }`
}

By setting a default value for the scheme field, we have kept existing callers that do not pass this field working:

url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'

And callers can opt in to using the new field should they wish to:

url({ scheme: 'https:', host: 'example.com', port: 443, path: '/' })
// -> 'https://example.com:443/'

This is the opposite of what we saw for record subtyping, where adding a new field was a breaking change. Adding new options is a desirable compatible change, so our subtyping rules need to reflect that. What about removing a field, such as port:

function url(options) {
  let { host, path } = options
  return `http://${ host }:80${ path }`
}

Now, any caller that was using the previous set of options will keep running and returning a result without crashing:

url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'

And callers can decide to stop using the option now that it’s been removed – the following call was previously not valid, but is now legal:

url({ host: 'example.com', path: '/' })
// -> 'http://example.com:80/'

However, users might be surprised to see that an option they pass in is not being used:

url({ host: 'example.com', port: 443, path: '/' })
// -> 'http://example.com:80/'

Coming from the argument about object polymorphism above, this might appear harmless, but consider how named arguments are normally used. Their names are usually given inline at each call site – the names host, port and path appear literally in the above calls. When we’re working with class instances, their fields are usually defined once in the class definition and do not appear inline:

url(new Request('example.com', 80, '/'))

This means that each call site where url() is used is not concerned with the interface contract between url() and the passed-in Request object. That contract is hidden away, in the definitions of url() and Request. If we had to spell it out at every use of url(), it would become much harder to use polymorphism effectively to save ourselves effort.

As named arguments appear at every call site, that means changing the names of the arguments requires changing every call site, rather than a few class definitions. It also means we have an opportunity to mistype the argument names every time we use the function, for example:

url({ hot: 'example.com', pot: 80, path: '/' })
// -> 'http://undefined:undefined/'

If a mistyped parameter name is silently ignored, it can be very hard to notice it’s not being used, especially if the called function has a default value for the option you meant to type. Here the difference in the returned string is immediately obvious but if this is buried deep inside your system you might not be looking at it directly, and the fact your options are being ignored can manifest in very indirect ways.

It’s often more helpful to the user to raise an error on any unrecognised arguments, just as Python and many command-line programs do. Whereas an object having many fields not needed by the current function is an expected aspect of polymorphic programming, unrecognised named arguments more often than not indicate a mistake that makes the input ambiguous, and the called function should crash immediately.

There is also a subtle effect from callers being allowed to pass unrecognised names that makes it more risky for new options to be added. Say we have our original url() function with host, port and path options, and some callers are passing an additional option called query containing a query string:

url({ host: 'example.com', port: 80, path: '/', query: 'q=hello' })
// -> 'http://example.com:80/'

We then deploy a new version of url() that has a query option, which is expected to be an object to be passed to querystring.encode():

function url(options) {
  let { host, port, path, query = {} } = options
  query = querystring.encode(query)
  return `http://${ host }:${ port }${ path }?${ query }`
}

The caller passing query as a string will now get an incorrect result, compared to a caller using this option as intended:

// incorrect:
url({ host: 'example.com', port: 80, path: '/', query: 'q=hello' })
// -> 'http://example.com:80/?'

// correct:
url({ host: 'example.com', port: 80, path: '/', query: { q: 'hello' } })
// -> 'http://example.com:80/?q=hello'

Again we see a silent failure here: querystring.encode() happens to return an empty string if given a string as input, but we could imagine uses of query that would cause more serious problems.

If we forbid callers from using undefined argument names, then we can be sure that whenever we add a new option, there will be no existing callers using that name, and therefore no possibility of strange behaviour when the new version is deployed. So as well as providing better feedback about mistakes to users when functions are first used, it also helps those functions’ maintainers keep their behaviour stable.

We’ve established that named arguments invert some of the rules for record subtyping: an argument that omits fields from the parameter type is valid, and an argument that introduces additional fields is invalid. What about the types of the fields themselves – is it valid to pass a field whose value is a supertype of that the function is expecting? What if we passed an object instead of a string in for scheme:

url({ scheme: {}, host: 'example.com', port: 443, path: '/' })
// -> '[object Object]//example.com:443/'

This looks like an error, so it seems that argument values still need to be subtypes of the fields in the parameter type.

We now have enough information to give the subtyping rules for named arguments. If a function has a set of named parameters given type T, and we pass a set of arguments given type S, then S <: T if:

  1. every field name that appears in S also appears in T, and
  2. for every field k that appears in S, S[k] <: T[k].

That is, any names used in the caller’s arguments must be present in the function’s parameters, and the argument types must be compatible with those the function is expecting. Rule 1 inverts S and T compared to the record subtyping rule, and rule 2 is essentially the same: it covers the set of fields common to both S and T, which if rule 1 is satisfied is the same as the fields in S here.

Early in this piece I mentioned that it’s important to know whether you’re using named arguments to provide order-independence or to make inputs optional, and now we can see why. Required arguments work like records: you have to provide all the required fields, so providing any object with at least those fields is fine. Optional arguments work by the rules given above, where you may provide only the defined options and no others. The way argument validation works for these two sets is entirely different, and you may want to separate required and optional arguments into distinct parameters.

For example in our url() example we could break the options down as follows:

  • host is required, there is no reasonable default value for it
  • path is optional and defaults to '/'
  • scheme is optional and defaults to 'http:'
  • port is optional and a default is chosen based on scheme

So we might decide to make host a required argument via a parameter that works like a record, and the other parameters optional via a second parameter obeying the named argument rules:

const DEFAULT_PORTS = {
  'http:': 80,
  'https:': 443
}

function url(props, options = {}) {
  let { host } = props
  let { path = '/', scheme = 'http:', port = DEFAULT_PORTS[scheme] } = options

  return `${ scheme }//${ host }:${ port }${ path }`
}

url({ host: 'example.com' })
// -> 'http://example.com:80/'

url({ host: 'example.com' }, { scheme: 'https:' })
// -> 'https://example.com:443/'

url({ host: 'example.com' }, { scheme: 'https:', port: 9000 })
// -> 'https://example.com:9000/'

This split might feel unnatural in some cases, although it may make more sense once we’ve seen how to make JS enforce named argument semantics. The right design depends on how these arguments tend to get used – are these object values typically the results of function calls, or inline object literals, and how are they generated and combined? The fact JavaScript makes no type-level distinction between an instance of a class, and some options that differ at every call site, means you need to put a little more thought into getting to a design that will evolve nicely over time.

In later articles, we’ll look at how we can implement the desired behaviour for named arguments in JavaScript, and look at how this concept is realised in other languages.