APIs

Today we're going to be discussing APIs, which in many ways are central to providing more advanced functionality. Take payments as an example, it used to be the case that if you wanted to take payments on your site you had to build code to handle credit card processing and pass extremely difficult audits to store and manage that information. Nowadays there are a host of APIs you can call from the backend to manage the provided payment information.

To work with a company's API, you're somewhat reliant on that partner setting up facilities to make it easy and safe to develop and test your code. The one you'll be working with in your assignment is rather complete in this regard, but you should be aware that the spectrum is pretty broad.

RESTful APIs

When we covered Cookies, I introduced the structure of an HTTP request. At the time I mentioned that GET and POST were the most common verbs used, but in APIs a broader range is typically employed to provide more meaningful endpoints. Here's the basic meaning of those that are commonly used:

GET
Used to retrieve data, requests using this verb should not alter the system
POST
Make a general change to the system
PUT
Create or replace a specific resource
DELETE
Delete a resource
PATCH
Partially modify a resource

It might be easiest to understand how these concepts work with an example. Suppose we wanted to create an API for placing and viewing orders, here's the endpoints that might be created and what they do:

GET /orders/5
Retrieve information about the order with ID of 5
POST /orders
Create an order
PUT /orders/5
Create an order with the ID of 5, or replace that order if it already exists
DELETE /orders/5
Remove the order with the ID of 5
PATCH /orders/5
Update the order with the ID of 5, leaving any unspecified information as the old value (e.g. a new items list is sent but the address was excluded and therefore didn't change)

These endpoints represent an abstraction on top of orders, but the actual implementation can vary widely. Let's consider a few different ways the underlying abstraction could be implemented.

The most obvious choice, and the one used by most of the web, is a database table. Here's how the methods map to SQL operations:

GET
SELECT
POST
INSERT
PUT
Combination of INSERT and UPDATE, more on this later
DELETE
DELETE
PATCH
UPDATE

Another possibility would be to represent it files in a file system:

GET
Get the contents of the file
POST
Create a new file
PUT
Create or replace a file
DELETE
Delete a file
PATCH
Apply a diff to an existing file

You aren't limited to the obvious choices, though, it's also possible to get way out there and implement it in something like the blockchain. Etherium is able to store and retrieve information, so would be equally valid as an implementation.

The important thing to understand is that all of these different implementations could serve the same API. If you needed to change out implementations of the API for some reason, the people calling it wouldn't have to change and don't even need to be aware that it's different.

People often refer to this structuring of an API as a REST (or RESTful) API, which stands for "representational state transfer." As with many academic concepts in Computer Science, though, the colloquial meaning has drifted quite a bit from the formal definition. You can read about the formal meaning if you're interested, but I've focused on how people use this concept day to day.

Idempotence

One thing that might be difficult to understand is the difference between PUT and POST. The core difference between the two is idempotency, when PUT is called multiple times it only changes the system on the first call. In our example you could say that if you called POST /orders 10 times, 10 records would be created; but if you called PUT /orders/5 10 times, a new record would be created only on the first call.

Idempotent operations are also valuable when performing database migrations. Sometimes a migration fails partway through, and by making the change idempotent the already changed records would not be affected by rerunning the migration.

Formats

The vast majority of APIs will be implemented using JSON, it's simple to understand and the libraries to handle it are fairly ubiquitous. It definitely has its limitations, though, such as limited data types and lack of extensibility. Most people get around these limitations by convention, but other solutions also exist.

An old but still prominent player is SOAP, which is an XML-based protocol that allows for a formal definition of the request structure. It is extensible enough to allow more robust types, which can be make interaction more stable, but it also is much more cumbersome to interact with than JSON. There's nothing wrong with SOAP, but expect to spend more time upfront figuring out how to get the request structured correctly.

One drawback of SOAP is that it's extremely verbose, and sending/receiving those extra characters can eat a lot of bandwidth. Protocol Buffers provide a lot of the same benefits of SOAP, but serialize into a binary format that's extremely compact. It has some general popularity, but the most prominent place you will encounter it is libraries developed by Google, as they designed the format originally.

Authentication

Often times, you will need some way of proving your identity in order to interact with an API. You'll encounter a range of different ways to do this, as with most things in Software Engineering, people choose different approaches because of the various tradeoffs involved.

The simplest way to do this is with an API key, which is a static string generated for you that's sent with every request in some way. Sometimes this is sent in a header, but sometimes people will want you to send it somewhere in the body of the request. While simpler, since it's sent with every request, there's more chances to steal the token and impersonate your application.

If you're sending the same API key in every request, technically there's no header dedicated to this purpose, but people will sometimes abuse the Authorization header. The format of this header is Authorization: <auth-scheme> <authorization-parameters>, and people use the Bearer scheme to send the key every time. This scheme is supposed to be the first step in an OAuth authentication flow, more on that later.

The Authorization header has a number of different schemes, and you may end up encountering others. The details of the various schemes are typically negotiated by libraries. One common scheme is Basic, which is used if you have a username and password; usually your library will do this for you, but it's technically sent base64 encoded.

The most secure method you'll commonly encounter is OAuth. This scheme technically has several different forms depending on the use case, to better understand how it works I'll describe what happens under the covers when you login to a website with something like Facebook or Google. Here's the basic sequence of events if we were to implement authentication with Facebook on North of Boston:

  1. You visit North of Boston
  2. You click the link to "Login with Facebook"
  3. North of Boston sends you to a page hosted by Facebook with some information identifying the request as coming from North of Boston
  4. You approve access
  5. You get redirected back to a page specific to OAuth on North of Boston with an access token
  6. North of Boston makes a request to Facebook with the access token, receiving a more permanent token it can use in future API requests
  7. You get redirected somewhere else on the site and are fully authenticated

There are libraries around to help make this process easier, but because it involves coordinating many requests requiring user interaction, they're typically a little difficult to get working well. That being said, the process is nuanced enough it's worth looking around a bit for something, it can be tricky to implement correctly.

Status Codes

We discussed status codes briefly when we covered cookies, but it's helpful to understand them a little bit more thoroughly when working with APIs. The first thing to understand is that the first digit conveys the category of the response:

100
Informational - you won't see this one all that much
200
Success - in API requests this is typically what you hope to get
300
Redirect - not used in APIs all that much
400
Client error - something went wrong on your end
500
Server error - something went wrong on the API side

For most of these ranges, all you really need to know is the meaning of that first digit. Since 400 responses represent something going wrong with how you're interacting with the API, though, here are a few you'll probably find useful to know off the top of your head:

400
Bad Request - the server is basically telling you it didn't know what to do with what you sent
401
Unauthorized - you need to authenticate in some way to call this endpoint
403
Forbidden - you're authenticated, but still not allowed to interact with this endpoint
404
Not Found - I'll assume you already know this one :)
418
I'm a teapot - well, if the API is a teapot I feel like that's important to know
429
Too many requests - many APIs will limit the rate at which you can make requests, you'll get to know this one well in API heavy applications

Don't feel like you have to go out and memorize these, but you'll get pretty familiar with them the more you work with APIs. I work with APIs quite a bit and I still sometimes have to remind myself of the difference between 401 and 403 occasionally.

Exploration and Testing

The process of building logic to work with an API can be a little tricky. It's easy enough to make a simple request, but creating logic to handle all of the ways that request can fail is complicated. On top of that, many APIs will charge for each call that you make, so limiting the number of requests wherever possible can become important quickly.

The better APIs will provide a sandbox or test account that you can use to build your system. In these systems, you have the ability to try out requests and examine responses, even run automated tests to ensure that interaction still works as well as it did before. If you're lucky, they'll even have specific data in place to simulate different types of failures that can occur; for example, Twilio offers magic phone numbers that are configured in predictable ways such as invalid phone numbers or ones where text messaging is blocked.

Many APIs these days are specified using something called OpenAPI, which is an open standard that split off from Swagger. Some people will still call it Swagger, as they gained a lot of popularity before the standard was separated. When an API is specified in this manner, it's extremely popular to generate documentation off of the specification, and the documentation generation tools will often provide a way to interactively call the API. This can sometimes be helpful during exploration, but be wary as occasionally it just won't work with how the API is designed.

When I need to start building interaction with an API, I typically find it helpful to first play around with it a little bit to get a sense of how it works. I'll sometimes try to use the interactive documentation I mentioned above, but because it can often not work I will usually quickly drop into some more robust tool. Many people like Postman, and that's a perfectly fine tool as long as you don't mind constantly ignoring their nudges to upgrade to the paid version. As a grizzled Unix programmer, though, I'll often just use curl as it's already on most systems.

Once I have a sense of how it operates, I'll try to strip back to the simplest request that I can possibly make. It's tempting to throw information into the request until it works and then not touch it, but typically this leads to a poor understanding of what's actually going on. By getting something simple and layering on information to achieve different objectives, I feel like I have a much deeper understanding of how the API actually operates.

Once I have a strong understanding of the endpoints I need to use, I'll create a class dedicated to interaction with that API. Even if my language has a package to work specifically with that API, I'll layer my own class on top of that to communicate my understanding of how it works. This layer also allows me to build an abstraction that can be swapped out during automated testing, as API calls are slow and potentially expensive.