Stay Classy, My Friends

October 14th, 2010

Motivation can be a tricky thing, and realizing you did an absolutely piss-poor job of something is a great way to destroy any remaining motivation you might have. Such was the case with my last post, which I count as among the worst missteps I've had in a while.

You see, I was thinking about naming conventions so I wanted to do a blog post about it. Unfortunately, I was thinking about naming types at the time, rather than naming variables, so talking about Hungarian notation was entirely the wrong thing to do. I do believe Hungarian notation still has its place, but that blog post really wasn't it.

So let's just pretend that stinker never happened and talk about what I really meant....

PHP Includes

I haven't done a lot of heavy lifting with PHP, but those that have will know that including files can be rather annoying. The reason for this is that a lot of people, for very good reasons, like to keep distinct bits of code (usually classes) in separate source files, and PHP requires you to declare every file you need before you can use its contents. This means that the tops of your functions or other files can get loaded with a lot of include directives quickly, and you need to be careful that you include everything you need, everywhere you need it, in long patches of code that are nothing but these housekeeping lines.

Since this can be tedious and sometimes error prone, the PHP developers added autoloading functions to hand the work of finding and including files over to PHP, leaving the developer free to worry about building the core logic of the application.

Autoloading Classes

The simplest way to get PHP to load your classes for you is to define a function called __autoload with a single parameter which will hold the name of the class we're trying to find - i.e. function __autoload($className). In this function, we will write some logic to look for the class in question and include it into our application.

The upside of this is that PHP knows by default to look for the __autoload function when it can't find a class. The downside is that this can become fragile if you start including external libraries which may have their own __autoload functions defined. As with the Highlander, there can be only one.

To deal with this issue, we have the spl_autoload functions.

spl_autoload

Where __autoload is a just a single function you can define to determine the location of missing classes, the spl_autoload functions allow you to define as many different functions as you need to handle your cases. The benefit of this is that you no longer need to worry about library code having an __autoload function which collides with yours. You can simply use the spl_autoload_register function to define a list of different functions to use to find your classes.

For example, let's say you've defined the standard __autoload function to find classes you've written yourself. This works fine until one day you decide that you really need to use the AwesomeFramework in your code. When you download the framework, you discover that it has its own __autoload function. Since PHP doesn't know what to do with two __autoload functions, it just dies.

The solution is simple: just pass your autoload function (after you've renamed it to avoid confusion with the other __autoload function) as a parameter to spl_autoload_register and PHP will use it to look for the missing class.

The problem with this solution is that the spl_autoload_register function will override the default __autoload function rendering AwesomeFramework's __autoload function useless. Simply add a second call to spl_autoload_register, this time passing the __autoload function from AwesomeFramework as the parameter. If the function is inside a class, you can either refer to it via the normal double-colon operator (i.e. spl_autoload_register("AwesomeFramework::__autoload")) or by passing an array as the parameter, where the array has only two items: the class name, and the function name (i.e. spl_autoload_register(array("AwesomeFramework", "__autoload"))). I honestly don't know what the difference is between these methods, though I would definitely appreciate some comments to fill me in on the issue.

I've created some sample code that you can play with to get a feel for how the autoloader works.

Naming Conventions

If you're going to keep your code separated into different directories, like I am, it helps to have a naming convention for your classes so that your autoloader can easily navigate your directory tree. I decided on a relatively simple convention for my names - I simply append _Model, _View, or _Controller to classes that belong to each respective group. My autoloader uses this to determine the type of class it's looking for, and tries to find the rest of the name as a filename in the directory for that type of class:

public static function loadClass($class) {
  $nameComponents = explode("_", $class);
  if (count($nameComponents) > 1) {
    $classType = $nameComponents[count($nameComponents) - 1];
    $className = substr($class, 0, -(strlen($classType) + 1));
  } else {
    $classType = "";
    $className = $class;
  }

  switch($classType) {
    case "Model":
      $classPath = MODELS;
      break;
    case "View":
      $classPath = VIEWS;
      break;
    case "Controller":
      $classPath = CONTROLLERS;
      break;
    default:
      $classPath = ENTITIES;
      break;
    }

//...<snip>

The uppercase symbols in this code are paths to the respective directories for each class type. As you can see, I've also created a default path which assumes that if the class isn't a model, view, or controller, the class must be an entity (i.e. an object representing an actual "thing" in the code, rather than an abstract concept used to separate concerns). Some people might not like using these sorts of fact-repository classes, but I tend to think it's cleaner to have distinct objects floating freely across the different concerns of the application. In this way, each section can handle their data however they wish, yet still have a standard to conform to when communicating to other portions of your design.

The rest of the function is fairly basic, and simply determines the ultimate filename in which we think the class is defined, and logic to require_once the file if it is readable (remember, just because a file exists doesn't mean your application actually has permission to use it).

The beauty of this function is that if it fails to find and include the needed class, it will simply fall through to the next defined autoload function until it does. If it can't find it at all, your application will error out (unless you implement some error handling to handle the problem). Typically you should only see these errors in test, since you will have tested thoroughly enough to ensure all needed files are discoverable and accessible.

Conclusion

Hopefully this has been a somewhat enlightening guide to how to let your classes get out of your way so you can deal with what really matters. I'll leave it as an exercise for the reader to determine how best to handle the basics of the index.php file, so that this autoloader (and the constants it depends on) can be properly used.

In my next post, I'm going to start actually dealing with the presentation of the application, and will deal with some basic HTML and CSS.

Cheers!

I wanted to write about building the router portion of our MVC application today, but I think it's worth taking some time to talk about Hungarian notation first.

Hungarian notation is the name given to a type of variable naming convention used by some programmers. The point of this notation, invented by a man called Charles Simonyi while at Microsoft, was that the type of a variable should be made obvious as part of its identifier.

Now, if you're old enough to remember pre-.Net Visual Basic, you were probably taught to use Hungarian notation somewhere down the line, and you've probably learned to hate it. You're not alone, and you're not wrong, you just don't understand the history. You see, there are actually two types of Hungarian notation: one of them - the one you probably know - is more or less useless, but the other can be quite handy.

Systems Hungarian

As I said, there are two types of Hungarian notation, and they were both born at Microsoft, but to two different teams. The folks who worked on Windows came up with the version we now refer to as "Systems Hungarian" and it is this version that most people know and despise.

When the systems division first learned of Hungarian notation, they mistakenly believed that when the Simonyi's original research paper referred to the "type" of a variable, it literally meant the data type of that variable. This led to programmers prefixing their variables with such useless noise as "str" to denote a string or "int" to denote an integer. You may have been taught this in school, or even forced to use it, and for a while, you may have even thought this made sense (I did, for longer than I'd care to admit). Eventually, however, you would have realized, or had it pointed out to you, that it really didn't add any extra value to your code.

The reason most people use to denounce Systems Hungarian is that your functions and classes should tend to be highly cohesive. This means that your code focuses on a specific task, rather than attempting to be everything to everyone. By maintaining this cohesion, you will reduce the number of variables in play at once, and the ones that remain will have data types that are relatively obvious.

In short, people don't like Systems Hungarian because it adds no semantic value to your code, and so it just serves to make your variable names less readable.

Applications Hungarian

While the systems programmers at Microsoft were busy cluttering up their code, the applications division (i.e. the people behind products like Microsoft Office) had another way of doing things. They had what a lot of people consider a better way.

Rather than using a prefix to explain the variables data type, the applications developers used prefixes to establish a variables purpose, and add semantic information to help future developers understand it. In this way, variables which utilized the same underlying data type would be distinguished as being incompatible with each other, where under the Systems Hungarian notation they might not.

An Example

A common example of this uses the Excel source code to make its point. If you were creating Excel, you would probably need some way of indicating a particular column in a spreadsheet. Since describing fractions of a particular column doesn't make any sense, any rational developer would store a whole number to store the column position - i.e. an integer.

If you were to use Systems Hungarian, you might end up with a variable name which looks like intColumnPosition. As you can see, the "int" prefix here doesn't explain much beyond the fact that the column position is a whole number. That might seem like a good thing at first, but really the fact that a column position is a whole number would be obvious to most people.

In Apps Hungarian, however, the same variable would probably have a name like colPosition. In this example, the prefix is used to describe the purpose of the variable (i.e. as a column in a spreadsheet), and leaves the "obvious" information, like the data type, out.

It also has the added bonus of making certain classes of mistakes become more obvious to the casual observer. Take the following code, for example:

intCurPos = rows.getCurrent();
There doesn't seem to be anything wrong there, right? How about this line:

Column.Select(intCurPos);

Perhaps you can now see the problem: just a few moments ago, we assigned a value based on a row position, and now we're assigning it to a column? That makes no sense! You're probably also thinking that it's a pretty obvious mistake, right? Any idiot would notice the error. Well, it might seem that way, but that's only because these lines were presented to you close together. What if, instead of having only a line of text between them, there was a whole function of code dividing these two lines? What if they were member variables in a class and were set and retrieved in different methods? Would you see them then? Let's try this again:

rwCurPos = rows.GetCurrent();
// ... lots of code here ...
Column.Select(rwCurPos);

To an eye trained for Apps Hungarian, that last line would stand out as wrong no matter where it was in the code.

Now, you're probably asking me why we can't do something like this:

currentRowPostition = rows.GetCurrent();
// ...code, code, code...
Column.Select(currentRowPosition); //wrong!

By making "Row" part of the variable name, it becomes obvious that it shouldn't be used to select a column. You could definitely do this, but I think in a lot of cases a form of Apps Hungarian probably works better, for four reasons:

  1. It's shorter, so...
  2. You can pack more descriptors into a smaller space if you need to, and...
  3. They're always in the same spot, right up front...
  4. ...which makes it easier to maintain a team/application/company-wide standard.

Conclusion

Now that I've explained the different types of Hungarian notation, I think we're better positioned to talk about our MVC router in the next post. I needed to get this out of the way first because we're going to use a special PHP function to include our classes for us based on the name of this class. Since I'd like to keep the source code directory tree relatively clean and organized, I'm going to use a form of Apps Hungarian to name my classes and filenames to help this special function navigate my file structure.

Extra Reading

When I was still in college, I used to read a blog called Joel On Software. My understanding of Hungarian notation largely stems from one particular post he wrote. It's called Making Wrong Code Look Wrong. Check it out.