I'm building yet another new toy in PHP (and no, I haven't finished the others...) and I ran into a familiar problem. I use ICDSoft for my hosting for appsCanadian, since when I bought the hosting I only intended to host this blog on it and I didn't need anything fancy or expensive. I just went with the cheapest option that had positive reviews. Now, I'm not knocking ICDSoft at all. In fact, they've been a perfectly good host, and they've been amazingly responsive to any and all inquiries I've had. However, being an economy host, I'm working on a shared server with no ssh access. This means that I have no ability to install Subversion, which is still my go-to version control system, largely since I've never bothered to use Git or Mercurial.

This presents a minor problem for me, since I really like developing on this server, for various reasons like the fact that my laptop is running Windows and I'm just paranoid enough to not want to develop on a different OS than I'll ultimately be deploying to. My reasons aren't that important, and if you're reading this it's likely that you have your own reasons for wanting to develop on some remote host you can't install things on or ssh into.

I've tried to find workarounds for this, and last time around I decided that I would just run Ubuntu in a virtual machine on my laptop. I found this to be a bit of a burden as I don't normally spend a lot of time using Linux, although I am relatively comfortable working with such a box, and I didn't like having to boot up a whole other machine just to make a quick change to my project. This time around, I decided to find a better way.

After a bit of brainstorming and a few searches on Google, I stumbled across WinSCP which is an open source (S)FTP/SCP client for Windows. The main feature that hooked me was the scripting features, which have commands for synchronization and directory monitoring. I've written a small script which will allow monitoring:

sync.txt
option batch on
option confirm off
open WINSCP_SESSION
option exclude .svn/
synchronize remote -delete "LOCAL_DIRECTORY" "REMOTE_DIRECTORY"
keepuptodate -delete "LOCAL_DIRECTORY" "REMOTE_DIRECTORY"

And I run pass this script to WinSCP from a Windows batch file:

sync.bat
"C:\Program Files\WinSCP\WinSCP.com" /command /script=sync.txt

The execution of this script is pretty straightforward. First we open a particular WinSCP session, defined in advance via the GUI, which contains credential information for your (S)FTP/SCP connection. Next, we do an initial synchronization of the remote directory with the contents of the local directory, opting also to delete any remote files which no longer exist locally. You can also replace the "remote" option with "local" to synchronize the other way around, or use "both" to do a two way sync moving the most recent files from each side to the other (note that the delete option obviously has no effect when synching both ways). Finally we use the keepuptodate command to tell WinSCP to monitor your local directory and push any changes up to your remote directory. Since the whole point of this exercise is to be able to develop remotely while still being able to checkout a working copy from svn locally, I've also used an exclusion option to tell WinSCP not to propogate any changes having to do with a .svn directory, since this is Subversions work area and it's never needed on the server.

With this script running, I can make any and all changes necessary to my project on the local files, and as I save these changes WinSCP will quietly send them to my server where I can see them immediately. I just have to be careful not to make any changes directly to the server, or I'll run the risk of having them overwritten by any local changes that get made.

One final caveat. There seems to be a known issue where a directory deletion made locally will NOT get propagated to the remote directory, though it seems its contents will be deleted without issue. It seems that because WinSCP has the directory open to monitor changes within it, Windows isn't able to complete the deletion until the monitoring process ends. Because the deletion isn't completed until the monitoring process ends, the completed deletion won't be noticed by WinSCP and will therefore not be applied to the server. It's a bit of a catch-22. I did make a bug report on the WinSCP forums, but the developer responded saying he's seen the issue before but is unsure how he can fix it. I'm therefore not going to hold my breath for a bug fix, but luckily there's an easy workaround. If you either need to recreate the directory locally (Windows won't allow its recreation until the deletion process of the original can be deleted, for obvious reason) or if the existence of the directory on the server is a problem, you can simply restart the batch script and the initial synchronization process will clean up for you.

Since WinSCP is open source, I would love to be able to contribute a fix to this, but the code is in C++ and a little lot above my ability. If anyone out there knows how to fix this issue, I'm sure a patch would be appreciated by the developer.

I hope that this information can help someone else who finds themselves in a similar situation to mine. If so, I'd love to hear from you if for no other reason than to convince myself that I'm not crazy for using such a setup.

I read a blog post just now via Hacker News where a guy called Matt Swanson was lamenting that someone else had beaten him to the punch and launched a website similar to an idea he had. By the end of the post, he had come to the conclusion that this wasn't the end of the world - it taught him that the idea had merit and he was simply going to compete with them, and that he could still win because he was building a better mousetrap.

This post sort of struck a chord with me, since I had the same experience a number of months ago. In fact, my experience was almost identical since my idea was also a tech-oriented book site. Finding these competitors is definitely a blow, but I'm risen above it by taking stock of my motivations and realizing that the product was never the point for me.

After a few years of working with ancient technologies such as COBOL, I found in a bit of a depression since my ambitions had been corrupted. I felt rather like the character of Marshall Eriksen on the CBS sitcom How I Met Your Mother. Marshall began his legal career with the goal of becoming an Environmental Lawyer, but found himself working for an evil megacorporation to pay the bills. He's in the right industry, but he's no longer pursuing the career he wanted.

My goals for my book site are a lot different than Matt's: I'm not so much developing a product so much as trying to teach myself something new. This will be the first website I've ever developed from scratch, and it's taking a long time because I'm being a bit of a perfectionist with the project. I figure if this is worth learning, then it's worth learning right. It's much harder to unlearn something you've learned incorrectly than it is to learn it right the first time. Learning something right, however, is a more time intensive process.

So Hacker News didn't ruin my morning like it ruined Matt's. My goals haven't changed, and my goals were never based on being the first mover for this idea. My goal was to use this project to develop my skills so that next time around I'll be better equipped to compete. If I learn well this time make the development process second nature to me, I can compete on execution alone next time. Maybe then I'll win.

Laying Things Out On The Web

October 19th, 2010

We've already talked about the basics of separating the logic of your application from its presentation, and we've put in place the building blocks of how your code will load its components as needed. Before we talk about creating the Controller classes which will handle incoming requests and determine their output, I want to talk a bit about creating the HTML templates that will make up that output.

Fixed vs. Fluid Layouts

There are two main methods to laying out a website - fixed layouts and fluid ones. There are also methods which combine these two, but they're beyond the scope of this discussion.

In a fixed layout, the site is structured not to exceed a certain width. These sites tend to be contained within a single <div> container element, which has been set to be exactly a certain pixel width. From there, the site can be designed largely as if it were on a piece of paper. If you have an image that you know is 200x200 pixels, and you know that your site will always be shown with a width of 1000 pixels, you know that your picture will take up one-fifth of the horizontal space on the page. Because of this ability to create pixel-perfect designs up front, fixed width designs are much easier to create than fluid designs.

Fluid designs, on the other hand, have no concept of a hard site width. Instead, element widths are specified as percentages of the total horizontal screen real estate available. As a result, content can easily stretch to use all available horizontal space in the users browser, eliminating dead, unused space.

To better visualize the difference between these two layouts, I have created two very simple pages to illustrate each layout:

Sizing Your Wrapper

Largely because I'm new to web design in general, I'm going to use a fixed-width layout for my site. I figure it's best to start with the easy stuff before moving on to more involved techniques.

The first thing I need to do is decide what size I want my site to be. The two leading sizes are 760 pixels and 960 pixels, which roughly correspond to the 800x600 and 1024x768 screen resolutions, respectively. These numbers were decided by taking the horizontal width of the screen (800 and 1024 pixels, respectively) and shaving off a bit to accommodate things like scroll bars and other browser chrome.

Since 800x600 resolutions are pretty rare these days, most people opt to design using the larger 960 pixel option. Those with lower resolutions will see a horizontal scroll bar at the bottom of their screen when using your site. A good tip would be to use the right-most 20% of your design for less important, or non-essential information, as it is this portion of your design that would be "hidden" by default for users with an 800x600 resolution. 20% of a 960 pixel design is 192 pixels, roughly the difference between the 760 and 960 pixel options.

Creating the Wrapper

Creating the wrapper for your fixed-width layout is dead simple: just place a <div> around your content, styled with the following CSS:

#wrapper {
  width:  960px;
  margin: 0 auto;
}

This CSS breaks down as follows:

  • The #wrapper identifier just says that the following styles only apply to the element called 'wrapper'. There are lots of ways we could have specified this, like using a class rather than an id, but since we're planning on wrapping everything inside this element, it makes sense to use the id since there will only be one of them.

  • The width property has already been discussed at length, so we don't need to go over that again here.

  • The margin property is a bit interesting. The two values given are just shorthand, which says "remove margins from the top and bottom of this element, and automatically set its left and right margins." Setting the left and right margins to margin the element the same way on both sides, effectively centering the element on the screen.

One Container, or Many?

One thing I find a bit odd about a lot of the sites I've peeked at the markup for is that they all use a single massive container to hold all of their content. I'd like to propose that we can separate this massive container into smaller ones.

Instead of this:

#container {
  width:  960px;
  margin: 0 auto;
}

<div id="container">
  <!-- stuff... -->

</div>

We could instead have this:

.frame {
  width:  960px;
  margin: 0 auto;
}

<div id="header"  class="frame"> <!--header  stuff--> </div>
<div id="content" class="frame"> <!--content stuff--> </div>
<div id="footer"  class="frame"> <!--footer  stuff--> </div>

I think it gives a little added flexibility, and modularizes your page, at little extra cost. Just a thought.

Conclusion

Hopefully this has shed a bit of light on how different layouts work, and can get you started with building your first one.

Stay Classy, My Friends

October 14th, 2010

Motivation can be a tricky thing, and realizing you did an absolutely piss-poor job of something is a great way to destroy any remaining motivation you might have. Such was the case with my last post, which I count as among the worst missteps I've had in a while.

You see, I was thinking about naming conventions so I wanted to do a blog post about it. Unfortunately, I was thinking about naming types at the time, rather than naming variables, so talking about Hungarian notation was entirely the wrong thing to do. I do believe Hungarian notation still has its place, but that blog post really wasn't it.

So let's just pretend that stinker never happened and talk about what I really meant....

PHP Includes

I haven't done a lot of heavy lifting with PHP, but those that have will know that including files can be rather annoying. The reason for this is that a lot of people, for very good reasons, like to keep distinct bits of code (usually classes) in separate source files, and PHP requires you to declare every file you need before you can use its contents. This means that the tops of your functions or other files can get loaded with a lot of include directives quickly, and you need to be careful that you include everything you need, everywhere you need it, in long patches of code that are nothing but these housekeeping lines.

Since this can be tedious and sometimes error prone, the PHP developers added autoloading functions to hand the work of finding and including files over to PHP, leaving the developer free to worry about building the core logic of the application.

Autoloading Classes

The simplest way to get PHP to load your classes for you is to define a function called __autoload with a single parameter which will hold the name of the class we're trying to find - i.e. function __autoload($className). In this function, we will write some logic to look for the class in question and include it into our application.

The upside of this is that PHP knows by default to look for the __autoload function when it can't find a class. The downside is that this can become fragile if you start including external libraries which may have their own __autoload functions defined. As with the Highlander, there can be only one.

To deal with this issue, we have the spl_autoload functions.

spl_autoload

Where __autoload is a just a single function you can define to determine the location of missing classes, the spl_autoload functions allow you to define as many different functions as you need to handle your cases. The benefit of this is that you no longer need to worry about library code having an __autoload function which collides with yours. You can simply use the spl_autoload_register function to define a list of different functions to use to find your classes.

For example, let's say you've defined the standard __autoload function to find classes you've written yourself. This works fine until one day you decide that you really need to use the AwesomeFramework in your code. When you download the framework, you discover that it has its own __autoload function. Since PHP doesn't know what to do with two __autoload functions, it just dies.

The solution is simple: just pass your autoload function (after you've renamed it to avoid confusion with the other __autoload function) as a parameter to spl_autoload_register and PHP will use it to look for the missing class.

The problem with this solution is that the spl_autoload_register function will override the default __autoload function rendering AwesomeFramework's __autoload function useless. Simply add a second call to spl_autoload_register, this time passing the __autoload function from AwesomeFramework as the parameter. If the function is inside a class, you can either refer to it via the normal double-colon operator (i.e. spl_autoload_register("AwesomeFramework::__autoload")) or by passing an array as the parameter, where the array has only two items: the class name, and the function name (i.e. spl_autoload_register(array("AwesomeFramework", "__autoload"))). I honestly don't know what the difference is between these methods, though I would definitely appreciate some comments to fill me in on the issue.

I've created some sample code that you can play with to get a feel for how the autoloader works.

Naming Conventions

If you're going to keep your code separated into different directories, like I am, it helps to have a naming convention for your classes so that your autoloader can easily navigate your directory tree. I decided on a relatively simple convention for my names - I simply append _Model, _View, or _Controller to classes that belong to each respective group. My autoloader uses this to determine the type of class it's looking for, and tries to find the rest of the name as a filename in the directory for that type of class:

public static function loadClass($class) {
  $nameComponents = explode("_", $class);
  if (count($nameComponents) > 1) {
    $classType = $nameComponents[count($nameComponents) - 1];
    $className = substr($class, 0, -(strlen($classType) + 1));
  } else {
    $classType = "";
    $className = $class;
  }

  switch($classType) {
    case "Model":
      $classPath = MODELS;
      break;
    case "View":
      $classPath = VIEWS;
      break;
    case "Controller":
      $classPath = CONTROLLERS;
      break;
    default:
      $classPath = ENTITIES;
      break;
    }

//...<snip>

The uppercase symbols in this code are paths to the respective directories for each class type. As you can see, I've also created a default path which assumes that if the class isn't a model, view, or controller, the class must be an entity (i.e. an object representing an actual "thing" in the code, rather than an abstract concept used to separate concerns). Some people might not like using these sorts of fact-repository classes, but I tend to think it's cleaner to have distinct objects floating freely across the different concerns of the application. In this way, each section can handle their data however they wish, yet still have a standard to conform to when communicating to other portions of your design.

The rest of the function is fairly basic, and simply determines the ultimate filename in which we think the class is defined, and logic to require_once the file if it is readable (remember, just because a file exists doesn't mean your application actually has permission to use it).

The beauty of this function is that if it fails to find and include the needed class, it will simply fall through to the next defined autoload function until it does. If it can't find it at all, your application will error out (unless you implement some error handling to handle the problem). Typically you should only see these errors in test, since you will have tested thoroughly enough to ensure all needed files are discoverable and accessible.

Conclusion

Hopefully this has been a somewhat enlightening guide to how to let your classes get out of your way so you can deal with what really matters. I'll leave it as an exercise for the reader to determine how best to handle the basics of the index.php file, so that this autoloader (and the constants it depends on) can be properly used.

In my next post, I'm going to start actually dealing with the presentation of the application, and will deal with some basic HTML and CSS.

Cheers!

I wanted to write about building the router portion of our MVC application today, but I think it's worth taking some time to talk about Hungarian notation first.

Hungarian notation is the name given to a type of variable naming convention used by some programmers. The point of this notation, invented by a man called Charles Simonyi while at Microsoft, was that the type of a variable should be made obvious as part of its identifier.

Now, if you're old enough to remember pre-.Net Visual Basic, you were probably taught to use Hungarian notation somewhere down the line, and you've probably learned to hate it. You're not alone, and you're not wrong, you just don't understand the history. You see, there are actually two types of Hungarian notation: one of them - the one you probably know - is more or less useless, but the other can be quite handy.

Systems Hungarian

As I said, there are two types of Hungarian notation, and they were both born at Microsoft, but to two different teams. The folks who worked on Windows came up with the version we now refer to as "Systems Hungarian" and it is this version that most people know and despise.

When the systems division first learned of Hungarian notation, they mistakenly believed that when the Simonyi's original research paper referred to the "type" of a variable, it literally meant the data type of that variable. This led to programmers prefixing their variables with such useless noise as "str" to denote a string or "int" to denote an integer. You may have been taught this in school, or even forced to use it, and for a while, you may have even thought this made sense (I did, for longer than I'd care to admit). Eventually, however, you would have realized, or had it pointed out to you, that it really didn't add any extra value to your code.

The reason most people use to denounce Systems Hungarian is that your functions and classes should tend to be highly cohesive. This means that your code focuses on a specific task, rather than attempting to be everything to everyone. By maintaining this cohesion, you will reduce the number of variables in play at once, and the ones that remain will have data types that are relatively obvious.

In short, people don't like Systems Hungarian because it adds no semantic value to your code, and so it just serves to make your variable names less readable.

Applications Hungarian

While the systems programmers at Microsoft were busy cluttering up their code, the applications division (i.e. the people behind products like Microsoft Office) had another way of doing things. They had what a lot of people consider a better way.

Rather than using a prefix to explain the variables data type, the applications developers used prefixes to establish a variables purpose, and add semantic information to help future developers understand it. In this way, variables which utilized the same underlying data type would be distinguished as being incompatible with each other, where under the Systems Hungarian notation they might not.

An Example

A common example of this uses the Excel source code to make its point. If you were creating Excel, you would probably need some way of indicating a particular column in a spreadsheet. Since describing fractions of a particular column doesn't make any sense, any rational developer would store a whole number to store the column position - i.e. an integer.

If you were to use Systems Hungarian, you might end up with a variable name which looks like intColumnPosition. As you can see, the "int" prefix here doesn't explain much beyond the fact that the column position is a whole number. That might seem like a good thing at first, but really the fact that a column position is a whole number would be obvious to most people.

In Apps Hungarian, however, the same variable would probably have a name like colPosition. In this example, the prefix is used to describe the purpose of the variable (i.e. as a column in a spreadsheet), and leaves the "obvious" information, like the data type, out.

It also has the added bonus of making certain classes of mistakes become more obvious to the casual observer. Take the following code, for example:

intCurPos = rows.getCurrent();
There doesn't seem to be anything wrong there, right? How about this line:

Column.Select(intCurPos);

Perhaps you can now see the problem: just a few moments ago, we assigned a value based on a row position, and now we're assigning it to a column? That makes no sense! You're probably also thinking that it's a pretty obvious mistake, right? Any idiot would notice the error. Well, it might seem that way, but that's only because these lines were presented to you close together. What if, instead of having only a line of text between them, there was a whole function of code dividing these two lines? What if they were member variables in a class and were set and retrieved in different methods? Would you see them then? Let's try this again:

rwCurPos = rows.GetCurrent();
// ... lots of code here ...
Column.Select(rwCurPos);

To an eye trained for Apps Hungarian, that last line would stand out as wrong no matter where it was in the code.

Now, you're probably asking me why we can't do something like this:

currentRowPostition = rows.GetCurrent();
// ...code, code, code...
Column.Select(currentRowPosition); //wrong!

By making "Row" part of the variable name, it becomes obvious that it shouldn't be used to select a column. You could definitely do this, but I think in a lot of cases a form of Apps Hungarian probably works better, for four reasons:

  1. It's shorter, so...
  2. You can pack more descriptors into a smaller space if you need to, and...
  3. They're always in the same spot, right up front...
  4. ...which makes it easier to maintain a team/application/company-wide standard.

Conclusion

Now that I've explained the different types of Hungarian notation, I think we're better positioned to talk about our MVC router in the next post. I needed to get this out of the way first because we're going to use a special PHP function to include our classes for us based on the name of this class. Since I'd like to keep the source code directory tree relatively clean and organized, I'm going to use a form of Apps Hungarian to name my classes and filenames to help this special function navigate my file structure.

Extra Reading

When I was still in college, I used to read a blog called Joel On Software. My understanding of Hungarian notation largely stems from one particular post he wrote. It's called Making Wrong Code Look Wrong. Check it out.

If you've done any sort of web programming recently, you have probably heard about MVC.

MVC, which stands for "Model-View-Controller," is a particular design pattern which attempts to separate the logic of an application from its presentation. It is particularly popular with web programmers, although it can be applied to other areas of programming as well.

Until recently, I was pretty clueless about MVC. My only knowledge of the subject came from passing references to it found in various blog posts and podcasts. When I started planning out my new web project, I accidentally reinvented MVC in my attempts to keep my design relatively clean. When I was told how close I was to MVC, I decided to do a little more research into the subject, and this blog post is the result of that research.

What is MVC?

There are four main components to MVC: the eponymous models, views, and controllers, and something called a router, which is essentially another controller, but which I consider enough of a special case to warrant special mention.

Routers and Front Controllers

Front Controllers are the first entry point into your application. As explained in my previous post, Apache web servers can be configured to redirect all requests into a single file, usually called index.php. This index.php file would be the front controller for the application.

It is from this file that your application is bootstrapped, and configuration files are loaded. These configuration files can and should include definitions which map your directory tree to constant values in your application (using define() statements in PHP), so you can quickly and easily find files your need.

Once the application is prepped and ready to go, the router takes over and directs (routes) processing to specific controllers which are better suited to handle that particular form of request. In practice, a lot of the time this is simply a matter of extracting variables out of a URL to find a controller name (remember from last time that, by design, URLs often begin with a controller name), verifying it exists, then passing control to that router.

You may have noticed by now that front controllers and routers have some overlap in responsibility. I consider them to be the same thing, but some purists may want to separate routing capabilities into their own class, so I have tried to separate them a bit in this description.

Controllers

Controllers are essentially gatherers. They are the middle managers of your application. Their entire role is to know what is needed to fulfill a task, and who to ask for it. They aren't there to do any real work (a fact many people seem to miss, leading to an epidemic of "fat controllers"); they are there to facilitate the work of others.

When a request is passed to a controller from the router, the constructor for that controller will once again extract data out of the URL to determine which action is required. In the example "questions" controller example used in the last post, the questions controller would need to decide between the "ask" and "display" actions available to it based on the data in the URL.

Once a suitable action is identified, a method is called to execute that action. This method will have a grocery list of data it needs to collect, and it will use any number of specific models to get that data. When data is returned from a model, it gets stored in the controller (or potentially a special data repository object) for later use.

Once the controller has collected all the data it needs, it will determine how the data is meant to be displayed. For example, the same data could be output as an HTML page, an RSS feed, or an email. When the controller knows what sort of display is needed, it passes all the data to an object called a view which is responsible for rendering the data in the appropriate format.

Models

You may have taken a class in school which explained the fundamentals of object-oriented programming (OOP). In this class, your professor probably made a big deal about how this paradigm can be used to create software models of real world objects for your application to interact with. The models in MVC refer to exactly the sort of models your professor was talking about.

Models can take many forms. The most common examples given for models online tend to take the form of database abstraction, but that is just the tip of the iceberg. If you were creating a banking application, you could have a Mortgage class to give approval, determine rates, and calculate payments. This mortgage class would be a Model.

Essentially, all of your business logic, all of your database logic, and practically anything that doesn't deal directly with displaying information to the user would be considered a model.

Views

I've covered views a bit already as part of my explanations of models and controllers, but let's explain it again anyway.

Views are the interface between your application and the end user. They are the HTML displayed by a users web browser, the message received by the users email client, and the feed in your users RSS reader.

By the time a controller passes control of the application to a view, all of the heavy lifting has already been done. All of the data to be displayed to a user is wrapped up in a shiny package and presented to the view with a bow on top. In fact, some people prefer that views aren't objects at all, but rather simple scripts with placeholders into which precomputed values are dumped. I happen to take a slightly different approach, and think that views can and should contain methods of their own. These methods, quite obviously, should only contain presentation logic, such as choosing between displaying a logged in users information, or a generic "log in or sign up" type of message. By separating this sort of logic out into methods, the main template of a view becomes that much cleaner.

Conclusion

Before I looked into MVC in any depth, I was convinced that it was a needlessly complicated framework that I would spend weeks trying to wrap my mind around. In the end, it only took me an hour or two one morning to get everything sorted out.

I encourage you, if you're starting your first MVC project, to take the time to build your own mini framework for a simple project you want to build. While premade frameworks like CodeIgniter are great, I think it's always a good idea to try to roll your own at least once. By forcing yourself to immerse yourself in the details, you will have a better understanding of what's going on should you decide to use an open source framework later on.

Happy coding!

For the sake of argument, I'm going to assume you're reading this blog directly on appsCanadian.ca and not in an RSS reader or some other fancy software. Now, I'd like to direct your attention to the address bar in your web browser, and take note of the URL:

http://www.appscanadian.ca/archives/structuring-your-urls-or-url-driven-design

In the old days of the internet, this URL would indicate to the reader that I had created a real directory on my web server called "archives" and in that directory I had placed a file called "structuring-your-urls-or-url-driven-design" which contained the HTML file you're viewing right now. These are not the old days of the internet, however, and I can assure you that there is no "archives" directory on my server, nor is there a file called structuring-your-urls-or-url-driven-design.

The way this sort of thing often happens today, is that there is a file on the server (most often called .htaccess for Apache servers) which examines incoming requests and redirects the browser to a file elsewhere on the server. More often than not, all requests for web pages (i.e. content that isn't something static like CSS or images) get routed to the same file, usually called index.php. In fact, this page you're currently reading was processed by a file called index.php on my server, and that script knows to serve this particular page because of a GET variable called 'p' which has a value of 139. This means that, for all intents and purposes, asking appsCanadian.ca to serve /archives/structuring-your-urls-or-url-driven-design is exactly the same as asking it for /index.php?p=139.

Obviously, the server needs to understand how one URL maps to another, so it makes a good deal of sense to spend some time planning your URL design before you actually begin making your website. By planning your various URLs, you will force yourself to spend a good deal of time thinking about how your site is actually going to work, breaking your features into your component parts.

An Example

As I write this, the highest voted question on the popular programming Q&A site Stack Overflow is located at the following URL:

http://stackoverflow.com/questions/194812/list-of-freely-available-programming-books

Lets break down this URL into its component parts to see what makes it tick.

First is the typical site identification stuff, the "http://stackoverflow.com" part of the URL. We don't much care about that since it's pretty standard. I will say that you should give a bit of thought about how you'll deal with subdomains. Google treats subdomains as separate sites, so it's probably best to keep everything under one roof - either have everything under a 'www' subdomain, or no subdomain at all.

The real fun begins when we examine the path. The first part of the path points to a "directory" called "/questions" which (as with my blog example above) doesn't actually exist. The first "directory" in a dynamic URL like this typically points to a specific "controller" in the application. In MVC-based applications (which I'll talk about in more detail in a future post), controllers can sort of be described as "sub-applications." A website is typically made up of several different controllers which can perform a variety of actions. They denote the boundaries of a specific portion of the website. In this case, the "questions" controller is in charge of, at a minimum, submitting and displaying questions on the site.

Next, there is another component which looks like a directory called '194812'. The precise meaning of this is something which is left to the specific site to determine, but in a lot of cases, the "second directory" will be a specific action that the first controller is meant to execute. This could take the form of a URL like "/questions/display" - the meaning of which should be obvious: the site is to 'display' a 'question' which is specified later in the URL. In the case of Stack Overflow, the website designers have decided to forgo the 'action' directive, and instead immediately give a target for which a default action will be applied. Specifically, the number 194812 is a question identifier, and the system knows that the default action for such an identifier is to display the question and its answers.

Finally, we have what looks like a filename. As with the blog example above, I can assure you that there is no file on Stack Overflows webservers which has the filename list-of-freely-available-programming-books (with the possible exception of some sort of file-based cache). Instead, the filename-like identifier "list-of-freely-available-programming-books" is what is referred to as a "stub." A stub is a bit of information which is contained in a URL entirely for decorative purposes. A site designer may choose to include additional information in a URL to make it more descriptive and alluring to end users, or to search engines. In the example, the words "list-of-freely-available-programming-books" are actually the title of the question being displayed. In most cases, stubs can be changed or removed entirely without altering how the site displays the page.

Designing Your Structure

Figuring out how your URLs will look really isn't the hardest part of designing a website, but I do think it is an important part. By working out your URL structures, you get a feel for how your site actually comes together. One way of doing this is to write out a brief explanation of what your site is and how you expect it to work. Here's an example I wrote up for Stack Overflow:

Stack Overflow is a Q&A site for programmers. Users can ask and answer questions and are awarded reputation points when the community decides they have a) asked a clear and useful question, or b) provided a correct answer to a question. These points are awarded based on votes provided by the community of users. Questions can be assigned up to 5 tags relating to the question, such as programming languages or platforms identified in the question. These tags can be edited by other users with a sufficient amount of reputation, and the question itself can be edited by users with even higher levels of reputation. Participation in the site is also rewarded by a set of "badges" which are awards for performing tasks within the system. Users can find questions by browsing based on one or more tags, or by the age or vote total of a question. When viewing a question, answers can also be sorted by age and votes. Similarly, the site will have a search function for finding questions containing certain language.

This is a pretty basic explanation of the site, but it works well enough. If you read over that description, you can probably identify some distinct sections of the site, and some actions which can be performed within those sections. Rather than review Stack Overflows actual URL structure (of which I have an incomplete knowledge), let's try to create our own URL structure.

The key components of the site seem to be asking questions, providing answers, and voting. Users also specify tags and can be awarded badges for good behaviour. Users can search for questions based on terms and tags, or they can browse questions freely.

Let's use these highlighted terms to create some "controllers":

  • /questions
  • /answers
  • /votes
  • /users
  • /tags
  • /badges
  • /search
  • /browse

These are the preliminary candidates, and not all of them may be suitable. Use your own judgment regarding which potential controllers will actually work for your own website.

Next, we'll need to identify actions for these controllers to take. For brevity, let's just review the questions controller:

  • The whole point of the site is for users to ask questions. So 'ask' is a good action for a question.
  • Once asked, the question will need to be available to other users to answer, so we'll need to display it, so lets add a 'display' action.
    • We need to be able to uniquely identify the question, so we'll use an ID number to tell the 'display' action what to show us.
    • For SEO purposes, let's also add a slug based on the question
    • We have to display answers with our questions, and answers can be sorted in a number of ways:
      • Newest
      • Oldest
      • Most votes
  • A user may want to edit a question he or she asked.
  • A user may want to delete his or her question completely.

This gives us the following URL options for questions:

/question
  /ask
  /display
    /id
      /slug
        /new
        /old
        /votes

This seems pretty good to me, but there are two issues you might want to consider:

  • The first potential issue is that a slug should be regarded as decoration, and the absence or misspelling of it shouldn't be fatal to your application. You can easily store the "preferred slug" for an item in your database, and redirect to the proper form of your URL when an incorrect version is received.
  • The second potential issue is that the /new, /old, and /votes sort items probably shouldn't form their own URLs. To a search engine, these will all appear to be separate links, and your potential page rank will be split across them. These options should probably be set in GET variables, but if you're really dead set on having perfectly pretty URLs, you can get around this by using browser redirects to invisibly rewrite a pretty URL into a GET variable.

If you really think about your site, and what you want to do with it, it should be pretty straightforward to design a URL strategy. When you're done, you should have a better overall grasp on how your site will work, and the rest of your design will start to fall into place.

Protect Your Source

September 19th, 2010

If you are serious about your programming project, then you should be serious about your source code too. Since your source code is your product, you need to protect it from getting damaged and becoming corrupt and unusable.

You might wonder how a digital file could get damaged, since in the general sense it doesn't have a physical form. I mean, if you take a picture on film, that film will naturally degrade over time, as will the pictures developed from it. Digital pictures, on the other hand, are just bits of information stored on a drive. If you make a copy of a copy of a copy of that original file, the picture contained in that copy will be exactly the same as the picture in the original. Since source code is safely locked away in the same sort of files, then surely it will stay in a pristine state in exactly the same sort of way, right?

Wrong!

Unlike your photos of your grandparents 60th anniversary, which are static items that are created and never changed (or which have some minor one-time touch-ups just after they're created to get rid of things like red eye), your source code is in a constant state of flux. As you test your software, you will inevitably find bugs in it, and those bugs will need to be fixed. When you do this, your code changes. When you find a better, faster way of performing a task, you refactor your code to take advantage of the new method, and your code changes. When you introduce new features, you may need to make tweaks to your existing product to make the new features work, and your code changes. Have you realized the issue yet? The biggest danger to your code is you!

Fortunately for you, this is a common issue. So common, in fact, that there already exist a variety of products to help save you from yourself. Collectively, these products are referred to as source control (or revision control, or version control, or configuration management, or...) and they allow you to keep a record of all the changes you make to your code and, if necessary, undo them. There are many different version control systems out there, but the one I'm going to use is called Subversion (often abbreviated 'svn').

With Subversion, all of your code is kept in something called a repository. The repository is a directory or database controlled by the Subversion software that keeps a running history of everything you have ever changed about a directory under Subversion's control. Each change to a repository is called a revision, and by giving Subversion a specific revision number, you can retrieve a copy of how your source code looked at any period in its history.

When you first create a repository, it is said to be at "Revision 0" which is a state where your repository holding just an empty directory. You create a new repository by telling the 'svnadmin' command to 'create' it at a specific location in your file system:

$svnadmin create ~/repositories/my_project

After you create the repository, you should consider the directory which contains it to be off limits. The only process that should interact with the repository at all is Subversion itself. When you have changes you would like to make, you will make them in a directory called a "working copy" which is a copy of the data held in the repository that you can use and modify without actually changing the canonical version in the repository (until you want to, anyway; more on that later). To create a working copy, simply create a directory to hold it anywhere you would like, then tell Subversion to "check out" a copy of the most recent revision of a specific repository:

$mkdir ~/my_project
$cd ~/my_project
$svn checkout file:///home/user/repositories/my_project

When you checkout a repository, Subversion will output a list of files and directories it creates (though in this example there are no files or directories in the repository to create), and it will output a message telling you which revision of your repository you're working with (e.g. "Checked out revision 0.").

In order for Subversion to properly keep track of the changes you make in your working copy, you need to babysit it a little bit. While Subversion will notice when you change a file already under its control, changes to the file system (for example, creating new files or directories, or moving them to new locations) have to be explicitly identified to Subversion. This is done by issuing certain file system altering commands from within the 'svn' application, i.e. using "svn cp", "svn rm", "svn mv" and "svn mkdir" rather than simply issuing "cp", "rm", "mv", or "mkdir" commands directly to the file system [I'm assuming a *nix OS here, but the Windows commands also work - "svn copy", "svn del", "svn rename"]. Note that when you create a brand new file in your working copy, this too needs to be identified to Subversion, using the "svn add" command.

Once you are done making modifications to your code in your working copy, such as after finishing writing a new function, you can tell Subversion to create a new revision in your repository which contains the updates now present in your working copy. To do this, you use the "svn commit" command:

$svn commit -m "Message explaining changes."

As you can see from this example, while you are able to specify a repository location when committing, it is not necessary. If no repository is listed, it is assumed that you are committing to the same repository you checked out from. The "-m" in the command is used to specify a "commit message," which is a note that you give to Subversion to outline the changes you made to the repository. This is useful for when you want to look back at previous revisions and quickly see what you did on each commit, possibly to find a specific change you made. It is also useful when you are working as part of a team of developers, rather than individually, because then the other developers can be quickly brought up to speed about the changes made to the repository since they checked out their copies by simply reading these summaries of what has been changed. When using the "-m" switch, you specify your commit message within quotes immediately thereafter. If you need to commit a longer message, you can write your message to a file before committing, then pass the filename to Subversion as part of your commit by using the "-f" switch. If neither the "-m" or "-f" switches are used, Subversion will check to see if you have defined a text editor program in a $SVN_EDITOR variable. If you have, Subversion will open the editor for you to enter your message when you try to commit. When you close the editor, Subversion will read your file to determine your commit message. If the $SVN_EDITOR variable isn't defined, however, Subversion will issue an error message and will abort the commit until a message is provided. If, for whatever reason, you decide not to include a commit message (not recommended!), you can simply pass an empty string to the "-m" switch during your commit.

Before committing your changes, however, there are two additional commands which you should get in the habit of utilizing:

The first command is "svn update" which checks if any updates have been made to your repository since you checked out your working copy. If there are changes, "svn update" will download them and update your working copy. If the updates interfere with your changes, Subversion will notify you of the conflict, and ask you to resolve them for it (either by choosing one copy or another to be used, or - more frequently - by updating your working copy to integrate the changes manually). You should try to "svn update" your working copy often while you make your changes, to reduce the number of conflicts you have to handle at one time, and to ensure you are always developing against up-to-date code. It is especially important to run this command immediately before attempting to commit your changes, as Subversion will not allow you to check in changes unless your working copy is up to date.

The second command is "svn status" which will give you a list of all changes you've made to your working copy since checking it out. The output of this command is simply a list of files that have been changed, preceded by a character indicating the type of change made. The four most common characters used for this purpose are as follows:

  • A - This indicates that a new file or directory is to be added to the repository for the first time.
  • D - This indicates that the file or directory is to be deleted. Note that this simply means that the file will no longer be listed with future revisions. Earlier revisions in the repository will still show the file.
  • M - This indicates that the file has been modified in the working copy since it was checked out.
  • ? - This indicates that Subversion has found a file or directory in your working copy that it isn't aware of. This happens largely when you create a new file in your working copy, but simply forget to use "svn add" to indicate to Subversion that you want to add it to the repository.

When you commit a change to Subversion, a new revision is created in its history. The real power of Subversion comes from the understanding that your new revision does not overwrite previous revisions - it simply replaces them as the most up-to-date copy of your repository. Every revision you have ever created still exists within Subversion, and can be accessed and reviewed. This allows you to "look back in time" and see earlier versions of your code, and allows you to understand why changes were made, or to undo those changes entirely.

Hopefully this has been a meaningful introduction to source control, and Subversion in particular. For more information, there is an excellent book called "Version Control with Subversion" which explains Subversion in amazing detail. You can download a PDF copy of this book for free at http://svnbook.red-bean.com or, if you prefer to read your books in dead tree format, you can order a physical copy from Amazon (check it out on Amazon.com or Amazon.ca).

The Project Begins

September 16th, 2010

Learning new programming languages isn't always the most fun experience if all you're doing is making trivial toy apps that whatever book you're reading decided was simple enough to fit their pages. The real learning, and the real fun, in programming comes from sinking your teeth into a real, non-trivial, useful application.

For a long time, I've struggled with coming up with project ideas. I really couldn't come up with much that excited me beyond the same simple crap used in those book examples. Fortunately for me, that problem seems to have disappeared. Over the last few months, I've come up with three separate projects that I want to build, and each new idea has felt more worthy than the last.

Now, I've decided that enough is enough, and it's time I got off my ass and built one of these projects. Actually, I decided this a few months ago, but unfortunately this decision came right when my wrist finally decided it had been abused enough. I've spent the past three months or so trying to work out my RSI symptoms (I'll likely write about that later), but I'm finally able to type without pain for reasonable stretches of time, so it's time to get to coding.

Since the project is probably pretty trivial for most web programmers to hack out, I'm going to keep the actual idea to myself for a while. In fact, a few days ago I discovered a new site that is a bit of a competitor, but I'm hoping that by aiming and designing for a specific niche of users will eventually allow me to win - or at least peacefully coexist - with this other site.

In the days and weeks to come, I'm going to use this space to write about whatever I learn while completing this project. Since I've never done a significant web project before (see the "toys" problem above), and my CSS is pretty rusty, I'm hoping that there will be enough good content to write about to justify my taking up your valuable time.

I'm also hoping this will become something of a two-way street, with your comments serving to correct my misunderstandings (and mark my words: there will be misunderstandings)and to help me become a better developer creating a better product.