Structuring Your URLs (or, URL-Driven Design)
September 23rd, 2010
For the sake of argument, I'm going to assume you're reading this blog directly on appsCanadian.ca and not in an RSS reader or some other fancy software. Now, I'd like to direct your attention to the address bar in your web browser, and take note of the URL:
http://www.appscanadian.ca/archives/structuring-your-urls-or-url-driven-design
In the old days of the internet, this URL would indicate to the reader that I had created a real directory on my web server called "archives" and in that directory I had placed a file called "structuring-your-urls-or-url-driven-design" which contained the HTML file you're viewing right now. These are not the old days of the internet, however, and I can assure you that there is no "archives" directory on my server, nor is there a file called structuring-your-urls-or-url-driven-design.
The way this sort of thing often happens today, is that there is a file on the server (most often called .htaccess for Apache servers) which examines incoming requests and redirects the browser to a file elsewhere on the server. More often than not, all requests for web pages (i.e. content that isn't something static like CSS or images) get routed to the same file, usually called index.php. In fact, this page you're currently reading was processed by a file called index.php on my server, and that script knows to serve this particular page because of a GET variable called 'p' which has a value of 139. This means that, for all intents and purposes, asking appsCanadian.ca to serve /archives/structuring-your-urls-or-url-driven-design is exactly the same as asking it for /index.php?p=139.
Obviously, the server needs to understand how one URL maps to another, so it makes a good deal of sense to spend some time planning your URL design before you actually begin making your website. By planning your various URLs, you will force yourself to spend a good deal of time thinking about how your site is actually going to work, breaking your features into your component parts.
An Example
As I write this, the highest voted question on the popular programming Q&A site Stack Overflow is located at the following URL:
http://stackoverflow.com/questions/194812/list-of-freely-available-programming-books
Lets break down this URL into its component parts to see what makes it tick.
First is the typical site identification stuff, the "http://stackoverflow.com" part of the URL. We don't much care about that since it's pretty standard. I will say that you should give a bit of thought about how you'll deal with subdomains. Google treats subdomains as separate sites, so it's probably best to keep everything under one roof - either have everything under a 'www' subdomain, or no subdomain at all.
The real fun begins when we examine the path. The first part of the path points to a "directory" called "/questions" which (as with my blog example above) doesn't actually exist. The first "directory" in a dynamic URL like this typically points to a specific "controller" in the application. In MVC-based applications (which I'll talk about in more detail in a future post), controllers can sort of be described as "sub-applications." A website is typically made up of several different controllers which can perform a variety of actions. They denote the boundaries of a specific portion of the website. In this case, the "questions" controller is in charge of, at a minimum, submitting and displaying questions on the site.
Next, there is another component which looks like a directory called '194812'. The precise meaning of this is something which is left to the specific site to determine, but in a lot of cases, the "second directory" will be a specific action that the first controller is meant to execute. This could take the form of a URL like "/questions/display" - the meaning of which should be obvious: the site is to 'display' a 'question' which is specified later in the URL. In the case of Stack Overflow, the website designers have decided to forgo the 'action' directive, and instead immediately give a target for which a default action will be applied. Specifically, the number 194812 is a question identifier, and the system knows that the default action for such an identifier is to display the question and its answers.
Finally, we have what looks like a filename. As with the blog example above, I can assure you that there is no file on Stack Overflows webservers which has the filename list-of-freely-available-programming-books (with the possible exception of some sort of file-based cache). Instead, the filename-like identifier "list-of-freely-available-programming-books" is what is referred to as a "stub." A stub is a bit of information which is contained in a URL entirely for decorative purposes. A site designer may choose to include additional information in a URL to make it more descriptive and alluring to end users, or to search engines. In the example, the words "list-of-freely-available-programming-books" are actually the title of the question being displayed. In most cases, stubs can be changed or removed entirely without altering how the site displays the page.
Designing Your Structure
Figuring out how your URLs will look really isn't the hardest part of designing a website, but I do think it is an important part. By working out your URL structures, you get a feel for how your site actually comes together. One way of doing this is to write out a brief explanation of what your site is and how you expect it to work. Here's an example I wrote up for Stack Overflow:
Stack Overflow is a Q&A site for programmers. Users can ask and answer questions and are awarded reputation points when the community decides they have a) asked a clear and useful question, or b) provided a correct answer to a question. These points are awarded based on votes provided by the community of users. Questions can be assigned up to 5 tags relating to the question, such as programming languages or platforms identified in the question. These tags can be edited by other users with a sufficient amount of reputation, and the question itself can be edited by users with even higher levels of reputation. Participation in the site is also rewarded by a set of "badges" which are awards for performing tasks within the system. Users can find questions by browsing based on one or more tags, or by the age or vote total of a question. When viewing a question, answers can also be sorted by age and votes. Similarly, the site will have a search function for finding questions containing certain language.
This is a pretty basic explanation of the site, but it works well enough. If you read over that description, you can probably identify some distinct sections of the site, and some actions which can be performed within those sections. Rather than review Stack Overflows actual URL structure (of which I have an incomplete knowledge), let's try to create our own URL structure.
The key components of the site seem to be asking questions, providing answers, and voting. Users also specify tags and can be awarded badges for good behaviour. Users can search for questions based on terms and tags, or they can browse questions freely.
Let's use these highlighted terms to create some "controllers":
- /questions
- /answers
- /votes
- /users
- /tags
- /badges
- /search
- /browse
These are the preliminary candidates, and not all of them may be suitable. Use your own judgment regarding which potential controllers will actually work for your own website.
Next, we'll need to identify actions for these controllers to take. For brevity, let's just review the questions controller:
- The whole point of the site is for users to ask questions. So 'ask' is a good action for a question.
- Once asked, the question will need to be available to other users to answer, so we'll need to display it, so lets add a 'display' action.
- We need to be able to uniquely identify the question, so we'll use an ID number to tell the 'display' action what to show us.
- For SEO purposes, let's also add a slug based on the question
- We have to display answers with our questions, and answers can be sorted in a number of ways:
- Newest
- Oldest
- Most votes
- A user may want to edit a question he or she asked.
- A user may want to delete his or her question completely.
This gives us the following URL options for questions:
/question
/ask
/display
/id
/slug
/new
/old
/votes
This seems pretty good to me, but there are two issues you might want to consider:
- The first potential issue is that a slug should be regarded as decoration, and the absence or misspelling of it shouldn't be fatal to your application. You can easily store the "preferred slug" for an item in your database, and redirect to the proper form of your URL when an incorrect version is received.
- The second potential issue is that the /new, /old, and /votes sort items probably shouldn't form their own URLs. To a search engine, these will all appear to be separate links, and your potential page rank will be split across them. These options should probably be set in GET variables, but if you're really dead set on having perfectly pretty URLs, you can get around this by using browser redirects to invisibly rewrite a pretty URL into a GET variable.
If you really think about your site, and what you want to do with it, it should be pretty straightforward to design a URL strategy. When you're done, you should have a better overall grasp on how your site will work, and the rest of your design will start to fall into place.
Leave a Reply