Hungarian Notation Considered Useful (To Me, Anyway)
September 27th, 2010
I wanted to write about building the router portion of our MVC application today, but I think it's worth taking some time to talk about Hungarian notation first.
Hungarian notation is the name given to a type of variable naming convention used by some programmers. The point of this notation, invented by a man called Charles Simonyi while at Microsoft, was that the type of a variable should be made obvious as part of its identifier.
Now, if you're old enough to remember pre-.Net Visual Basic, you were probably taught to use Hungarian notation somewhere down the line, and you've probably learned to hate it. You're not alone, and you're not wrong, you just don't understand the history. You see, there are actually two types of Hungarian notation: one of them - the one you probably know - is more or less useless, but the other can be quite handy.
Systems Hungarian
As I said, there are two types of Hungarian notation, and they were both born at Microsoft, but to two different teams. The folks who worked on Windows came up with the version we now refer to as "Systems Hungarian" and it is this version that most people know and despise.
When the systems division first learned of Hungarian notation, they mistakenly believed that when the Simonyi's original research paper referred to the "type" of a variable, it literally meant the data type of that variable. This led to programmers prefixing their variables with such useless noise as "str" to denote a string or "int" to denote an integer. You may have been taught this in school, or even forced to use it, and for a while, you may have even thought this made sense (I did, for longer than I'd care to admit). Eventually, however, you would have realized, or had it pointed out to you, that it really didn't add any extra value to your code.
The reason most people use to denounce Systems Hungarian is that your functions and classes should tend to be highly cohesive. This means that your code focuses on a specific task, rather than attempting to be everything to everyone. By maintaining this cohesion, you will reduce the number of variables in play at once, and the ones that remain will have data types that are relatively obvious.
In short, people don't like Systems Hungarian because it adds no semantic value to your code, and so it just serves to make your variable names less readable.
Applications Hungarian
While the systems programmers at Microsoft were busy cluttering up their code, the applications division (i.e. the people behind products like Microsoft Office) had another way of doing things. They had what a lot of people consider a better way.
Rather than using a prefix to explain the variables data type, the applications developers used prefixes to establish a variables purpose, and add semantic information to help future developers understand it. In this way, variables which utilized the same underlying data type would be distinguished as being incompatible with each other, where under the Systems Hungarian notation they might not.
An Example
A common example of this uses the Excel source code to make its point. If you were creating Excel, you would probably need some way of indicating a particular column in a spreadsheet. Since describing fractions of a particular column doesn't make any sense, any rational developer would store a whole number to store the column position - i.e. an integer.
If you were to use Systems Hungarian, you might end up with a variable name which looks like intColumnPosition. As you can see, the "int" prefix here doesn't explain much beyond the fact that the column position is a whole number. That might seem like a good thing at first, but really the fact that a column position is a whole number would be obvious to most people.
In Apps Hungarian, however, the same variable would probably have a name like colPosition. In this example, the prefix is used to describe the purpose of the variable (i.e. as a column in a spreadsheet), and leaves the "obvious" information, like the data type, out.
It also has the added bonus of making certain classes of mistakes become more obvious to the casual observer. Take the following code, for example:
intCurPos = rows.getCurrent();
There doesn't seem to be anything wrong there, right? How about this line:
Column.Select(intCurPos);
Perhaps you can now see the problem: just a few moments ago, we assigned a value based on a row position, and now we're assigning it to a column? That makes no sense! You're probably also thinking that it's a pretty obvious mistake, right? Any idiot would notice the error. Well, it might seem that way, but that's only because these lines were presented to you close together. What if, instead of having only a line of text between them, there was a whole function of code dividing these two lines? What if they were member variables in a class and were set and retrieved in different methods? Would you see them then? Let's try this again:
rwCurPos = rows.GetCurrent();
// ... lots of code here ...
Column.Select(rwCurPos);
To an eye trained for Apps Hungarian, that last line would stand out as wrong no matter where it was in the code.
Now, you're probably asking me why we can't do something like this:
currentRowPostition = rows.GetCurrent();
// ...code, code, code...
Column.Select(currentRowPosition); //wrong!
By making "Row" part of the variable name, it becomes obvious that it shouldn't be used to select a column. You could definitely do this, but I think in a lot of cases a form of Apps Hungarian probably works better, for four reasons:
- It's shorter, so...
- You can pack more descriptors into a smaller space if you need to, and...
- They're always in the same spot, right up front...
- ...which makes it easier to maintain a team/application/company-wide standard.
Conclusion
Now that I've explained the different types of Hungarian notation, I think we're better positioned to talk about our MVC router in the next post. I needed to get this out of the way first because we're going to use a special PHP function to include our classes for us based on the name of this class. Since I'd like to keep the source code directory tree relatively clean and organized, I'm going to use a form of Apps Hungarian to name my classes and filenames to help this special function navigate my file structure.
Extra Reading
When I was still in college, I used to read a blog called Joel On Software. My understanding of Hungarian notation largely stems from one particular post he wrote. It's called Making Wrong Code Look Wrong. Check it out.
September 28th, 2010 at 5:20 am
I personally can't bother with hungarian notation. If you use an IDE, the benefit from using it is practically zero, because as soon as you put an incorrect parameter into a function call the IDE will let you know. Even if it didn't, you probably could see the type of the variable by hovering over it or something like that.
Code Complete (or was it Pragmagic Programmer?) has a good example, similar to your "init var - much code here - use var"
When you have code which uses a variable, you should declare the variable as close to the using code as possible. This is exactly because of what you said - if the variable is visible when used, it's easy to see what it is. If it's declared close to where it's used, you again don't need hungarian notation for anything.
September 28th, 2010 at 6:06 am
Hey Jani, nice to hear from you again.
You seem to be a bit confused between the two types of Hungarian notation. What you're describing, with IDEs which catch incorrect data types, is Systems Hungarian. I tried to lay it out that Systems Hungarian is bad, but Apps Hungarian can be good.
I agree with your second premise, that variables should be used as close to their declaration as possible, but there are times when variables can have a longer life, or be part of a more complicated code block, and then these sorts of prefixes can be useful.
I should also note that this topic got way out of control as I was writing it. I tend to start with a single thought in mind, then write as a sort of stream of consciousness. This post is a particularly bad example, because it sort of started from the wrong point. I wrote this in response to a certain plan I had for naming classes, not a plan on how to name variables.
In my next post, I'm going to put this plan in action, using a Hungarian-looking naming convention to create class names that can be interpreted by PHP's __autoload() function, which
includes class definitions as they're needed, rather than having a set ofincludes littered throughout your code.Stay tuned. Hopefully this will make sense soon, and this particular post will get lost in time as a bad mistake.