Get Your API Right
Like the cut of our jib? You may be interested in our Rails Foundations Workshop July 20 – August 18.
Every project I’ve worked on in the last two years has heavily involved the use of web APIs. Libersy at the time (no idea about now) had an architecture that was extensively API based, even for communication between internal applications (an architecture I strongly argued against, bee tea dubs). Since then I’ve futzed with web APIs almost exclusively. From very narrow focused uses like University of Michigan’s Bluestream Service, to more broad but still fairly local APIs like the Ann Arbor District Library’s soon-to-be-updated API, all the way to APIs of major web applications like Twitter and Flickr.
Constant exposure has turned me into a bit of a snob: I can’t stand working with a poorly designed API! If you’re about to design or release an API for the web and want to avoid the ire of your developers, I’ve summed up the best (and worst) of what I’ve seen into 8 rules:
1) Use HTTP
I’ll grant that HTTP isn’t perfect but it’s at least well understood and nearly universal. Nobody is going to want to write against your custom application layer atop UDP. Nobody cares about the RPC system you built in 1997 that is “running just fine.” Unless you have some major reasons HTTP cannot work and are willing to provide solid client libraries in a dozen or more languages, stick with HTTP.
2) Use Your Verbs
Every HTTP request comes with a verb. These verbs have specific meaning and should be used correctly. Use GET to retrieve items, POST to add new items to your service, PUT to update existing items, DELETE to remove items, and HEAD for uses similar to GET where the programmer doesn’t need body content (e.g. cache checking).
If you’re concerned that not all client libraries implement the HTTP verbs (web browsers, for example, can’t submit PUT and DELETE requests) allow for HTTP verb faking. The emerging convention is passing a _method parameter as part of the request body with the verb as a lowercase value (put,delete, and head).
If you don’t use all the verbs, you won’t be able to give your users access to all possible actions on your data without resorting to specialty urls. It’s possible to create an API that uses only POST requests:
POST /photos/create
POST /photos/show/decafbebad
POST /photos/update/decafbebad
POST /photos/delete/decafbebad
By doing this you’re encoding the intended action directly in the url; this is needlessly redundant. HTTP already gives to a slot for specifying action (the verb). Just use it:
POST /photos
GET /photos/decafbebad
PUT /photos/decafbebad
DELETE /photos/decafbebad
Twitter’s API tries to have it both ways with destroying favorites. You can use either POST or DELETE as your software allows. To support client libraries that can’t DELETE, the deletion is additionally referenced in the URI (http://twitter.com/favorites/destroy/id.format).
Flickr only uses GET and POST necessitating specialty urls.
If you’re not allowing clients to create new data, or update/delete existing data on your system then you do not have an API. You have a feed. There’s nothing wrong providing read-only access to your data (it’s laudable, in fact), but I’m often disappointed to hear “Yeah! We have an API” only to find the person really meant they offered a number of customizable data feeds as XML.
The current Ann Arbor Library API is really a data feed.
Shopify’s API gets it right.
3) Keep Your URL/URIs Consistent
One of the most important principles of REST (which is quickly becoming the preferred method of organizing web APIs) is “every resource is uniquely addressable using Uniform Resource Identifiers”. In addition to making this URI unique, it’s important to make them patterned and consistent.
The Bluestream API, for example, uses the following four URIs to create a resource, retrieve a resource, find items related to that resource, and see the resource’s history of edits.
POST /ams/upload
GET /ams/rest/asset/A1001001A06B17B43948J49293
GET /ams/rest/related/A1001001A06B17B43948J49293
GET /ams/rest/history/A1001001A06B17B43948J49293
Two problems with the URIs above. First, the URI for creating new assets is wildly different than the other URIs. You’re better off sticking with a pattern: POST /ams/rest/asset/.
Second, the collection of assets related to A1001001A06B17B43948J49293 is conceptually a sub-asset. Typically you’ll see the collection nested within the full URI of its conceptual parent: /ams/rest/asset/A1001001A06B17B43948J49293/related. Same goes for the history of edits: /ams/rest/asset/A1001001A06B17B43948J49293/history.
Most data hierarchies will be fairly flat like Twitter’s so you frequently won’t need to expose urls with deeply nested data relationships. But if you do, many client libraries are designed to interact with REST APIs that follow a patterned URI scheme.
The pattern of Digg’s data feeds is easy to understand. You barely need any more documentation than a sentence for the urls of each type. Sadly, you can’t get data in using these urls but “endpoints for participating” are coming soon.
4) Use Your Status Codes
Every HTTP response comes back with a status code. Like HTTP request verbs, these responses statuses have specific meaning. Use them correctly! Many API providers send back only two codes: 200 for requests that worked and 500 for requests that didn’t work.
I once snapped this shot while using the Ann Arbor District Library’s current API:

Sending back 200 OK when you mean 404 Not Found is decidedly not OK.
So, when someone PUTs to a URI that only accepts GET and POST send back a 405 Method Not Allowed. When someone attempts to access data without authenticating, send back 401 Unauthorized. Most client libraries will convert these status into native errors/exceptions keeping the amount of body parsing required (I got back data with 200 OK, but was it a “200 OK here’s what you wanted” or a “200 OK shit is not ok?”)
5) Expose (And Accept) Multiple Data Formats
You should both expose data and accept incoming data in at least XML and json/jsonp. These two data formats represent the standard data serializations of the web. The format of these should allow for nested, related data. There is some variance in how this data should be formatted, but there is an emerging pattern
For XML:
Enclose a data type in an element of it’s name, not as bare data
Do this
<?xml version="1.0"?>
<person>
<name>...</name>
<age>...</age>
</person>
not this
<?xml version="1.0"?>
<name>...</name>
<age>...</age>
Enclose a collection in a tag of its type, pluralized. Each item should be in an element of its type
Do this
<?xml version="1.0"?>
<dogs>
<dog>
<name>...</name>
</dog>
<dog>
<name>...</name>
</dog>
</dogs>
not this
<?xml version="1.0"?>
<dog>
<name>...</name>
</dog>
<dog>
<name>...</name>
</dog>
Although most client libraries will happily parse either, the first allows you to place additional useful data about the collection as an attribute of the collection itself:
<?xml version="1.0"?>
<count>2</count>
<last-added>...</last-added>
<dogs>
...
</dogs>
Use elements for data, not attributes.
Client library support for XML attributes is inconsistent and what distinguishes data in elements from data in attributes isn’t always clear.
Do this:
<asset>
<id>A1001001A06C13B45909C08771</id>
<name>AMS_TEST_ITEM</name>
<display-name>AMS_TEST_ITEM</display-name>
<size>80638</size>
<thumbnail>
<is-defult>true</is-default>
<url>http://www.example.com/ams/icons/gif.gif</url>
</thumbnail>
</asset>
not this:
<asset ID="A1001001A06C13B45909C08771">
<entity name="AMS_TEST_ITEM" display-name="AMS_TEST_ITEM" />
<metadata name="AMS_SZ" display-name="Size" namespace="Info">80638</metadata>
<thumbnail isDefault="true">http://www.example.com/ams/icons/gif.gif</thumbnail>
</asset>
One valid use of attributes is to provide metadata. A common example is including suggested type casting, since XML (unlike json) only sends string data:
<asset>
<id type='integer'>A1001001A06C13B45909C08771</id>
<name>AMS_TEST_ITEM</name>
<display-name>AMS_TEST_ITEM</display-name>
<size type='integer'>80638</size>
<thumbnail>
<is-defult type='boolean'>true</is-default>
<url>http://www.example.com/ams/icons/gif.gif</url>
</thumbnail>
</asset>
Support for this in client libraries is spotty, however.
For json the structure is more formalized since json parses into native javascript date types. There are no attributes, only slots; strings, integers, booleans, nested object literals, nested arrays, and other formats can be sent as data in these slots. There’s really only one decision needed for json:
Have the outer object contained in a slot named for its type, or don’t
This one varies from API to API, both formats work well:
{
'name' : '...',
'age' : 22,
'dogs' : [
{'name' : 'fido', 'breed' : 'mutt', 'is_spayed' : true},
{'name' : 'killer', 'breed' : 'poodle', 'is_spayed' : false}
]
}
or
{
'person' : {
'name': '...',
'age' : 22,
'dogs' : [...]
}
}
The latter will allow you to send back additional data about the response
{
'access' : 'read-only',
'person' : {
'name': '...',
'age' : 22
}
}
But either format is acceptable.
For both XML and JSON, accept images as either multipart/post or a url
Images are tricky to manage via an API over the web (check out all the trouble that Twitter has had). Sending a multipart/post with the image data as content is probably the correct answer, but some http libraries (notably Ruby’s) have buggy or incomplete multipart/post implementations. Do your developers a favor and accept urls referencing images in addition to data POSTed as multipart body content.
6) Protect Your Users with OAuth
One thorny issue of user-specific API data is how you allow third party access. Twitter, until recently, required users to give their twitter username and password to third parties. You don’t ever want your users handing out their passwords! Instead, like Twitter, MySpace, Google Data, Get Satisfaction, Tripit, and many others, control access to your users’ data with OAuth. OAuth (not be mistaken with the very different OpenID) lets your users give access to a third party without giving away their credentials.
Libraries for adding OAuth to your API and for third-parties to consume data are available in all major languages.
7) Don’t Shut Off HTTP Authentication Entirely
OAuth isn’t always the answer. For cases where the end user isn’t involving a third party directly (e.g. a desktop or iPhone application that only communicates with your servers), allowing the user to simply enter their user name and password provides a more desirable user experience. At the very least, allow HTTP Authentication to obtain a token for these kinds of applications and then use that token for all other requests.
8) Document, Document, Document
Finally, fully document your API. You’ll want to demonstrate the following: valid URIs, required and optional data, HTTP status codes and what causes them, sample requests, and sample responses.
Having your documentation on the web in a structured format is best. PBWorks (formerly PBWiki) is very popular for API documenting: Twitter, Digg, and a few others are using it.

27 Comments
Jump to comment form | comment rss [?] | trackback uri [?]