Serlock Works: Programming Idealism, Avoiding Hungarian Notation

Hungarian Notation

From wikipedia, hungarian notation is an identifier naming convention in computer programming, in which the name of a variable or function indicates its type or intended use. There are two types of Hungarian notation: Systems Hungarian notation and Apps Hungarian notation.
System Hungarian is intended to emphasize the variable's type. It is extremely useful in interpret / dynamic language such as javascript or php, and useless at all in static programming language. Especially in compiled oop language such as Java and C#, where data contract and type casting is the major problem, it has no benefit at all.
Apps Hungarian is intended to describe the functionality of given variable, regardless of it's type. As Joel Spoolsky has been explained in his article, there are some variable that is prone to error, even though already has compiled-type checking. One of his example is between unsafe and safe string (encoded html tags for example), in which the type is same but serve different purpose.
The article is posted at 2005. It means it already there for more than 7 years around. Given current ability of compiler and programming language, what can we do to improve the design?

Problem

There lies one and only one problem in Joel's solution, that is the code can still pass compilation phase. As stated by Mark Seeman in his article, faster feedback means less costs to correct errors. Ideally, it is the best when we can get all the system's error during compilation phase, but it mostly impossible for some reasons (such as parsing error or business rule error, in which cannot be caught by compiler). In short, you need to create compile error as much as possible to catch wrong code, rather than getting run time exceptions.

The Proposed Design

Using Joel's example for safe and unsafe string, we need to create a design where we can handle safe and unsafe string which can give compile error. By using C# syntax, as usual for oop language, first I define some classes. The class is for unsafe string.

public class DecodedHtmlString
{
    public DecodedHtmlString(string decodedString)
    {
        this.decodedString = decodedString;
    }

    private string decodedString;
    public override string ToString()
    {
        return decodedString;
    }
}

Simple enough. It gives no benefit but gives you a self-documenting data type. The class represent a html string in a decoded way, and no encoding happen here. Next, for the safe (encoded string).

public class EncodedHtmlString
{
    public EncodedHtmlString(DecodedHtmlString decodedString)
    {
        this.encodedString = System.Web.HttpUtility.HtmlEncode(decodedString.ToString());
    }

    private string encodedString;
    public override string ToString()
    {
        return encodedString;
    }
}

Again, a self explaining class accepting encoded string from a decoded string. Now we want to make both of the classes communicate each other. We have several options such as type casting or static parsing, which is easy enough in C# that I won't explain. In here I will do constructor injection and To type casting instead. For the DecodedHtmlString, we add a constructor and ToEncodedHtmlString method:

    public DecodedHtmlString(EncodedHtmlString encodedString)
    {
        this.decodedString = System.Web.HttpUtility.HtmlDecode(encodedString.ToString());
    }
    public EncodedHtmlString ToEncodedHtmlString()
    {
        return new EncodedHtmlString(this);
    }

And for the EncodedHtmlString side:

    public static EncodedHtmlString FromEncodedString(string encodedString)
    {
        EncodedHtmlString result = new EncodedHtmlString();
        result.encodedString = encodedString;
    }
    public DecodedHtmlString ToDecodedHtmlString()
    {
        return new DecodedHtmlString(this);
    }

Consumer

Let's see from consumer point of view:

string unsafeString = Request.Forms["CUSTOM_INPUT"]; // input from form
string safeString = System.Web.HttpUtility.HtmlEncode(unsafeString); // encoded safe string for reference
DecodedHtmlString decoded;
EncodedHtmlString encoded;

// initial creation
decoded = new DecodedHtmlString(unsafeString); // correct
encoded = EncodedHtmlString.FromEncodedString(safeString); //correct

// type casting
encoded = decoded.ToEncodedHtmlString(); // correct
encoded = new EncodedHtmlString(decoded); // also correct
decoded = encoded.ToDecodedHtmlString(); // correct
decoded = new DecodedHtmlString(encoded); // also correct

// wrong initial creation
decoded = new DecodedHtmlString(safeString); // wrong
encoded = EncodedHtmlString.FromEncodedString(unsafeString); //wrong

// to primitive
unsafeString = decoded.ToString(); // correct
safeString = encoded.ToString(); // correct

// wrong to primitive
unsafeString = encoded.ToString(); // wrong
safeString = decoded.ToString(); // wrong

We got 4 possible wrong code, that is from primitive and to primitive parameter assignment, and for other scenarios it is correct. Now let's see whether we can exploit the data type validation with parameter accepting data type.

public void WriteToDatabase(EncodedHtmlString encoded)
{
    string encodedString = encoded.ToString();
    // doing with encodedString
}

WriteToDatabase(unsafeString); // compile error
WriteToDatabase(safeString); // compile error
WriteToDatabase(decoded); // compile error
WriteToDatabase(encoded); // correct

Now we got 3 compile error and one correct code. If you favor to get a compile error, it is an improvement since now you can protect myself from 3 possible parameter assignment errors. And if you carefully using the two data types instead of passing from primitive string, it will be fine. The only two things that can pass the compile error is when casting to primitive, or from primitive.

But hey, isn't most of the operation (at least safe and unsafe string) is using primitive type? If we take account Response.Write and Database operations, it is very clear that most of the critical operation is using primitive type. (even for url, etc). Moreover, we add 2 more classes for this design.

Conclusion

We can get the design where we will receive compile error instead of run time error or buggy code. However, we still cannot get one hundred percent buggy-code free with this design, and most of the operations are error-prone here. Additionally, it introduces two dependent classes as well, making it more tight coupling.

In the end, it is still the framework's support that do the decide. If the framework support the Encoded and Decoded datatype by default, and suggesting you to use the datatype instead of primitives, maybe it is worth it. However, with current framework design, it is very unlikely for this design to give decent benefit.

Programming Idealism, Avoiding Hungarian Notation

Hungarian Notation

Problem

The Proposed Design

Consumer

Conclusion

No comments: