One of the big challenges in maintaining open source software, or service-oriented systems, or indeed any program where a function and its callers can evolve independently and be maintained by different sets of people, is that of maintaining compatibility. When we ship a new release of a package, we usually want users to be able to upgrade without their programs breaking as a result. This is especially tricky when the language we’re using does not have a direct representation of a computational concept we’re using, as it leaves room for a function’s author and its users to have different interpretations of what the function does and how it should be used.
I notice this quite frequently in my JavaScript work. As I’ve noted in Breaking
Changes, JavaScript’s Object
type is overloaded with many different uses.
It’s most frequently used to represent records: compound values with a fixed
set of predefined fields with distinct types, where the program accesses
specific fields by name. But it also serves as a map type: an open-ended
collection of key-value pairs where the program doesn’t expect any particular
keys to exist and accesses all keys dynamically, often by iteration.
There are grey areas and overlaps and other uses that Object
is put to, and in
order to make backward-compatible changes to programs we need to be aware of
which of these uses we’re dealing with anywhere an Object
is used. In this
article we’ll look at a very common use of the JavaScript Object
type whose
behaviour is distinct from both records and maps: named arguments. Whereas
records are a widely covered concept in type theory, and find many applications
across programming languages in the wild, named arguments are less consistent in
their implementation and use, and it would help us to have a well-defined
conception of what named arguments actually are, what purpose they serve, and
how they ought to work.
Before we develop a theory, it will be helpful to look at an example from a language with named arguments: Python. Most mainstream languages use positional arguments; the arguments to a function are bound to its parameters in the order they appear. In Python, all function parameters can also be addressed by name. Take this example of a function that generates a URL string:
def url(host, port, path):
return 'http://%s:%d%s' % (host, port, path)
We can call this function with positional arguments, that is with the arguments given in the order of the parameters they correspond to:
url('example.com', 80, '/')
# -> 'http://example.com:80/'
But we can also call the function with named arguments, where instead of giving the arguments in a fixed order we give the name of the parameter each should be assigned to:
url(path='/', host='example.com', port=80)
# -> 'http://example.com:80/'
We can even mix these schemes, giving some arguments positionally and some with names:
url('example.com', path='/', port=80)
# -> 'http://example.com:80/'
In this way, a function’s parameter names are part of its public interface, and changing them can break any existing caller. This is a kind of coupling not present in many languages and it might look potentially brittle, but named arguments also give us a lot of flexibility when it comes to maintaining compatibility.
We should define what we mean by compatibility. Here I am specifically talking about backward compatibility, which is when a new version of a package works with all existing callers of a previous version. Version B of a package is backward compatible with version A if any program that ran successfully using version A would also run successfully using version B. If there are programs that would have worked with version A but will not work with B, then B contains breaking changes, making it harder to upgrade from A to B. This is what we’re trying to avoid. To determine this, we need to consider all the valid ways of using version A and determine whether they still work with version B.
Above we showed a version of the url()
function and some valid uses of it: 1.
positional arguments, 2. named arguments, and 3. mixed:
def url(host, port, path):
return 'http://%s:%d%s' % (host, port, path)
url('example.com', 80, '/') # 1
url(path='/', host='example.com', port=80) # 2
url('example.com', path='/', port=80) # 3
Imagine we’re writing version B of this function and want to add a new
parameter to change the URL scheme, so users can build URLs with https:
,
ws:
, and so on. Suppose we do this by adding a new parameter to the beginning
of the list:
def url(scheme, host, port, path):
return '%s//%s:%d%s' % (scheme, host, port, path)
Cases 1 and 3 are now clearly broken; none of the positional arguments will bind
to the correct parameters. Case 2 is less drastically broken; the named host
,
port
and path
arguments will still bind correctly, but we’re not providing a
scheme
argument which the function needs. Python throws an error in all 3
cases here.
We fix this by putting scheme
at the end of the parameter list and giving it a
default value:
def url(host, port, path, scheme='http:'):
return '%s//%s:%d%s' % (scheme, host, port, path)
Case 1, 2 and 3 are now all valid calls, because scheme
is optional and none
of the existing parameters have changed their positions or names. Callers that
upgrade to version B can begin using the scheme
parameter either
positionally or by name:
url('example.com', 443, '/', 'https:')
# -> 'https://example.com:443/'
url(scheme='ws:', path='/', host='example.com', port=80)
# -> 'ws://example.com:80/'
From these examples we see that named arguments have a couple of main uses:
- They provide a better interface when a function has lots of parameters whose order is hard to remember, or is likely to be unstable
- They allow some parameters to be omitted by callers by making them optional and giving them default values
Both of these can be used as strategies for achieving backward compatibility, because they make it easier to add new parameters to functions without breaking existing callers. However, it is important to know which of these reasons – whether you want arguments to be unordered or optional – you’re appealing to when using named arguments, and it affects how we should think about their types.
Note that Python considers it an error to pass an argument with a name the target function does not define:
url(path='/', host='example.com', port=80, extra=True)
# -> TypeError: url() got an unexpected keyword argument 'extra'
This is also an important detail when we consider backward compatibility. We’ll return to this idea later, but for now have a think about how rejecting unrecognised named arguments might affect what kinds of changes we’re able to make without breaking the behaviour of existing callers.
Now, let’s consider a language without named arguments. JavaScript does not have this feature, but we often use object literals to simulate it. Here’s our original URL-building function rewritten in JavaScript:
function url(options) {
let { host, port, path } = options
return `http://${ host }:${ port }${ path }`
}
If you prefer, the destructuring can be done inside the parameter list to make it look even closer to a language with genuine named arguments.
function url({ host, port, path }) {
return `http://${ host }:${ port }${ path }`
}
I’ll be using the first version as I will need to refer to this function’s parameter by name frequently.
The url()
function has a single parameter named options
. Within the body of
url()
we access the host
, port
and path
fields on this parameter, so to
get a meaningful result from this function we must pass an object with those
fields as an argument:
url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'
This is how most JavaScript code simulates named arguments. Let’s see how this
function responds when called with incorrect input. If we call url()
without
some of these required fields, we get a strange result. We don’t get an error
in the sense that the program does run without crashing, but producing a URL
with undefined
where its path should be was probably not what we intended.
This will manifest as an error somewhere else in the program, for example when
we try to make a request with this URL.
url({ host: 'example.com', port: 80 })
// -> 'http://example.com:80undefined'
Conversely, if we pass an object with fields that url()
does not access, it
simply ignores them.
url({ host: 'example.com', port: 80, path: '/', scheme: 'ws:' })
// -> 'http://example.com:80/'
A key question in thinking about types in relation to named arguments is: should this be considered an error, or is it expected behaviour? Likewise, should omitting a field be an error, or should the program be able to handle this case? To answer these questions, we need to look at what named arguments are for – how they are used, what good user feedback looks like, and how they affect how an API changes over time.
We’ve already seen how Python implements named arguments: an argument must be
given for all non-optional parameters, optional parameters can be omitted when
calling a function, but additional unrecognised arguments trigger an error. In
JavaScript none of our examples trigger an error in the sense of throwing an
exception, and that’s partly explained by JavaScript being designed to be more
permissive: it will often let an expression evaluate to undefined
rather
than throwing an exception. This is still a program error in the sense the
program does not do what we want, it’s just not recognised formally by the
language and so it’s easier for errors to happen silently and only be noticed by
a thorough test suite or picked up by end users in production.
To get a more complete explanation for this behaviour we need to remember that
Object
in JavaScript plays a lot of different roles, and it wasn’t designed to
represent named arguments specifically. We’re just abusing its syntax to do
something that resembles named arguments, but doesn’t actually work in the same
way. The behaviour above is better explained by noting that JavaScript primarily
thinks of objects as records, and then considering how subtyping works for
records.
In type theory, records are a compound structure that consist of a set of named
fields and their corresponding types. So given simple types like number
and
string
– atomic values with no internal structure – a record type would be
something like { host: string, port: number }
. Records are key building block
of type theory because they let us talk about arbitrary structures, not just
single values, in a way that maps onto features in most programming languages.
Let’s revisit our url()
function from above:
function url(options) {
let { host, port, path } = options
return `http://${ host }:${ port }${ path }`
}
What can we say about the type of the options
parameter here? We can see the
function body accesses the host
, port
and path
fields from it, and all
these are then interpolated into a string, so they can actually be any type that
converts to a string. Let’s say we ascribe options
the following type to
represent these requirements, giving some more specific field types that match
our expected usage. The notation x: T
means the parameter x
has type T
.
options: { host: string, port: number, path: string }
Here we inferred the type of options
based on how it’s used in the body of
url()
. Similarly, we can infer the types of argument expressions from the
syntax used to generate them.
url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'
Here, the argument expression is an object literal with three fields: host
,
which maps to a string
, port
which maps to a number
, and path
which maps
to a string
. So this argument expression has the type:
{ host: string, port: number, path: string }
This is exactly the type required by url()
, so this call works. Now let’s
examine the example calls that didn’t work:
url({ host: 'example.com', port: 80 })
// -> 'http://example.com:80undefined'
Here the argument’s type is { host: string, port: number }
– it’s missing the
path
field. url()
doesn’t throw an exception here but it does do something
we’d consider erroneous, so we can infer that this argument type is not
compatible with the type of the options
parameter in url()
. What about the
case where we provide additional fields?
// argument: { host: string, port: number, path: string, scheme: string }
url({ host: 'example.com', port: 80, path: '/', scheme: 'ws:' })
// -> 'http://example.com:80/'
Here the url()
function behaves normally, insofar as it uses the fields it
knows about correctly. So we can say this argument type is compatible with the
url()
function.
This is an example of an argument’s type not being identical to that required by
a function’s parameters, but the call still being valid. In type theory this
notion of compatibility is captured by subtyping: a type S
is a subtype of
type T
if, anywhere the type T
is required, an expression of type S
can be
used. The notation “S
<: T
” is used to mean “S
is a subtype of T
”. In
our examples above, we see that { host: string, port: number, path: string,
scheme: string }
is a subtype of { host: string, port: number, path: string }
– the former can be used where the latter is required. However { host: string,
port: number }
is not a subtype as its use leads to erroneous behaviour.
In general a record type S
is a subtype of another record type T
, if:
- every field name that appears in
T
also appears inS
, and - for every field
k
that appears inT
,S[k]
<:T[k]
.
That is, S
has at least all the same fields as T
and possibly more, and all
those fields’ types in S
are compatible with those in T
.
Why is this definition of subtyping for records useful? It has to do with our
need to use polymorphism in our programs: to write reusable functions that work
with many different types of objects, so long as those objects have certain
properties. Our url()
function can work on any object that possesses the
fields host
, port
and path
, for example an instance of a class that
defines these will work:
class Request {
constructor(host, port, path) {
this.host = host
this.port = port
this.path = path
}
}
url(new Request('example.com', 80, '/'))
// -> 'http://example.com:80/'
But making url()
accept only objects with those exact fields might be overly
restrictive, for example it would prevent us calling it with objects with
additional properties or methods:
import querystring from 'querystring'
class QueryRequest {
constructor(host, port, path, params) {
this.host = host
this.port = port
this.path = path
this.params = params
}
getQueryString() {
return querystring.encode(this.params)
}
}
url(new QueryRequest('example.com', 80, '/', { q: 'hello' }))
// -> 'http://example.com:80/'
An instance of QueryRequest
has two fields that url()
does not use: params
and getQueryString
(in JavaScript, a method is just a property whose value is
a function). url()
uses the fields it cares about and ignores the presence of
everything else. We can say that Request
and QueryRequest
are both subtypes
of the url()
parameter { host: string, port: number, path: string }
, because
they possess the required fields and so are acceptable as input to the url()
function. Note that this notion of subtyping is distinct from class inheritance;
QueryRequest
is not a subclass of Request
but both are subtypes of the
anonymous type { host: string, ... }
that url()
takes.
A class that does not implement the required interface should not be acceptable – the following code should be considered an error:
class City {
constructor(name) {
this.name = name
}
}
url(new City('San Francisco'))
This mechanism of functions using only the parts of an object relevant to them, and ignoring the object’s other aspects, is what enables the bulk of polymorphism in object-oriented languages. Requiring argument types to exactly match the types of function parameters would be too restrictive and would require many functions to be written anew for every new class they should work on. Therefore we allow argument types to be subtypes of the parameter types, under the rules for record subtyping given above.
What types of changes do these subtyping rules allow? Imagine that url()
is
part of an open source package installed in many applications. Let’s think about
how we could change it and how that would affect any existing caller. First, we
could remove one of the required fields, say port
:
function url(options) {
let { host, path } = options
return `http://${ host }:80${ path }`
}
This function now makes fewer demands of its callers – whereas previously it
required the host
, port
and path
fields, it now only requires host
and
path
. Any caller that fulfilled the old requirements certainly fulfills the
new ones and will therefore keep running fine. In general, replacing a parameter
type with one that’s less specific – a supertype of the old type – is safe.
Now, imagine we add a new required field, scheme
:
function url(options) {
let { host, port, path, scheme } = options
return `${ scheme }//${ host }:${ port }${ path }`
}
This function makes more demands of its callers; we can’t assume that any
existing caller passes an argument with a scheme
field, so this could break
existing callers. By making the parameter type more specific, replacing it with
a subtype, we make previously valid calls invalid. Only calls that conform to
this new type already will still work if this new version of url()
is
installed, and all others will break, silently, by inserting undefined
into
the URLs they generate.
Subtyping also gives us rules for what changes we are able to make while preserving compatibility. Replacing a function parameter type with a supertype is safe, replacing with a subtype is not. Our type theory for named arguments should reflect what kinds of changes are safe.
So what should the subtyping rules for named arguments be? What kinds of changes
should be considered compatible or not? Normally, we want to be able to add new
options to a library function without breaking existing callers. Indeed, as we
saw in the Python example, making new fields optional by giving them default
values makes this possible. So how about adding scheme
as an option to
url()
:
function url(options) {
let { host, port, path, scheme = 'http:' } = options
return `${ scheme }//${ host }:${ port }${ path }`
}
By setting a default value for the scheme
field, we have kept existing callers
that do not pass this field working:
url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'
And callers can opt in to using the new field should they wish to:
url({ scheme: 'https:', host: 'example.com', port: 443, path: '/' })
// -> 'https://example.com:443/'
This is the opposite of what we saw for record subtyping, where adding a new
field was a breaking change. Adding new options is a desirable compatible
change, so our subtyping rules need to reflect that. What about removing a
field, such as port
:
function url(options) {
let { host, path } = options
return `http://${ host }:80${ path }`
}
Now, any caller that was using the previous set of options will keep running and returning a result without crashing:
url({ host: 'example.com', port: 80, path: '/' })
// -> 'http://example.com:80/'
And callers can decide to stop using the option now that it’s been removed – the following call was previously not valid, but is now legal:
url({ host: 'example.com', path: '/' })
// -> 'http://example.com:80/'
However, users might be surprised to see that an option they pass in is not being used:
url({ host: 'example.com', port: 443, path: '/' })
// -> 'http://example.com:80/'
Coming from the argument about object polymorphism above, this might appear
harmless, but consider how named arguments are normally used. Their names are
usually given inline at each call site – the names host
, port
and path
appear literally in the above calls. When we’re working with class instances,
their fields are usually defined once in the class definition and do not appear
inline:
url(new Request('example.com', 80, '/'))
This means that each call site where url()
is used is not concerned with the
interface contract between url()
and the passed-in Request
object. That
contract is hidden away, in the definitions of url()
and Request
. If we had
to spell it out at every use of url()
, it would become much harder to use
polymorphism effectively to save ourselves effort.
As named arguments appear at every call site, that means changing the names of the arguments requires changing every call site, rather than a few class definitions. It also means we have an opportunity to mistype the argument names every time we use the function, for example:
url({ hot: 'example.com', pot: 80, path: '/' })
// -> 'http://undefined:undefined/'
If a mistyped parameter name is silently ignored, it can be very hard to notice it’s not being used, especially if the called function has a default value for the option you meant to type. Here the difference in the returned string is immediately obvious but if this is buried deep inside your system you might not be looking at it directly, and the fact your options are being ignored can manifest in very indirect ways.
It’s often more helpful to the user to raise an error on any unrecognised arguments, just as Python and many command-line programs do. Whereas an object having many fields not needed by the current function is an expected aspect of polymorphic programming, unrecognised named arguments more often than not indicate a mistake that makes the input ambiguous, and the called function should crash immediately.
There is also a subtle effect from callers being allowed to pass unrecognised
names that makes it more risky for new options to be added. Say we have our
original url()
function with host
, port
and path
options, and some
callers are passing an additional option called query
containing a query
string:
url({ host: 'example.com', port: 80, path: '/', query: 'q=hello' })
// -> 'http://example.com:80/'
We then deploy a new version of url()
that has a query
option, which is
expected to be an object to be passed to querystring.encode()
:
function url(options) {
let { host, port, path, query = {} } = options
query = querystring.encode(query)
return `http://${ host }:${ port }${ path }?${ query }`
}
The caller passing query
as a string will now get an incorrect result,
compared to a caller using this option as intended:
// incorrect:
url({ host: 'example.com', port: 80, path: '/', query: 'q=hello' })
// -> 'http://example.com:80/?'
// correct:
url({ host: 'example.com', port: 80, path: '/', query: { q: 'hello' } })
// -> 'http://example.com:80/?q=hello'
Again we see a silent failure here: querystring.encode()
happens to return an
empty string if given a string as input, but we could imagine uses of query
that would cause more serious problems.
If we forbid callers from using undefined argument names, then we can be sure that whenever we add a new option, there will be no existing callers using that name, and therefore no possibility of strange behaviour when the new version is deployed. So as well as providing better feedback about mistakes to users when functions are first used, it also helps those functions’ maintainers keep their behaviour stable.
We’ve established that named arguments invert some of the rules for record
subtyping: an argument that omits fields from the parameter type is valid, and
an argument that introduces additional fields is invalid. What about the types
of the fields themselves – is it valid to pass a field whose value is a
supertype of that the function is expecting? What if we passed an object instead
of a string in for scheme
:
url({ scheme: {}, host: 'example.com', port: 443, path: '/' })
// -> '[object Object]//example.com:443/'
This looks like an error, so it seems that argument values still need to be subtypes of the fields in the parameter type.
We now have enough information to give the subtyping rules for named arguments.
If a function has a set of named parameters given type T
, and we pass a set of
arguments given type S
, then S
<: T
if:
- every field name that appears in
S
also appears inT
, and - for every field
k
that appears inS
,S[k]
<:T[k]
.
That is, any names used in the caller’s arguments must be present in the
function’s parameters, and the argument types must be compatible with those the
function is expecting. Rule 1 inverts S
and T
compared to the record
subtyping rule, and rule 2 is essentially the same: it covers the set of fields
common to both S
and T
, which if rule 1 is satisfied is the same as the
fields in S
here.
Early in this piece I mentioned that it’s important to know whether you’re using named arguments to provide order-independence or to make inputs optional, and now we can see why. Required arguments work like records: you have to provide all the required fields, so providing any object with at least those fields is fine. Optional arguments work by the rules given above, where you may provide only the defined options and no others. The way argument validation works for these two sets is entirely different, and you may want to separate required and optional arguments into distinct parameters.
For example in our url()
example we could break the options down as follows:
host
is required, there is no reasonable default value for itpath
is optional and defaults to'/'
scheme
is optional and defaults to'http:'
port
is optional and a default is chosen based onscheme
So we might decide to make host
a required argument via a parameter that works
like a record, and the other parameters optional via a second parameter obeying
the named argument rules:
const DEFAULT_PORTS = {
'http:': 80,
'https:': 443
}
function url(props, options = {}) {
let { host } = props
let { path = '/', scheme = 'http:', port = DEFAULT_PORTS[scheme] } = options
return `${ scheme }//${ host }:${ port }${ path }`
}
url({ host: 'example.com' })
// -> 'http://example.com:80/'
url({ host: 'example.com' }, { scheme: 'https:' })
// -> 'https://example.com:443/'
url({ host: 'example.com' }, { scheme: 'https:', port: 9000 })
// -> 'https://example.com:9000/'
This split might feel unnatural in some cases, although it may make more sense once we’ve seen how to make JS enforce named argument semantics. The right design depends on how these arguments tend to get used – are these object values typically the results of function calls, or inline object literals, and how are they generated and combined? The fact JavaScript makes no type-level distinction between an instance of a class, and some options that differ at every call site, means you need to put a little more thought into getting to a design that will evolve nicely over time.
In later articles, we’ll look at how we can implement the desired behaviour for named arguments in JavaScript, and look at how this concept is realised in other languages.