Saturday 5 January 2013

How to Write Super Fast Code

The great thing about programming is that the proof is in the pudding.  If you've done a good job, your program works, meets the requirements and doesn't throw exceptions at the user.  The same goes for performance.

Writing uber fast code is a fun challenge that can be very rewarding.

Consider a conversation like this:

Code reviewer:  If you create a new variable and typecast it, and then check it for null, it will be faster than if you just use "is."

Developer:  Why do you say that?

Code reviewer:  Check out www.programming-guru.com/whatisayistruth.  In his article, Sir Jon Skeet MD PHP MCSD, says so.

Developer:  Here's some code which tests the performance of both versions.  My version is 0.00001 milliseconds faster.

The conversation is over, and there's no need for a pointless debate.

Performance is something that a programmer should keep in the back of his mind at all times, however in most cases the readability of code is the most important.  One only needs to ensure that something performs quickly if there's a chance that it will be noticeably slow.

Many years ago, when I was still in school, I wrote a program that needed to be amazingly fast.  I wanted to squeeze as many calculations per clock cycle that I possibly could.  Because of my need for speed, I chose to write it in assembler.  And so, it looked something like thousands of lines of this:

MOV AX, 42
JMP CX
DOG BRK
FOX 1
R2  D2
SUB MRN, DX
C3  PO
RAB BIT
etc., etc....

Okay, I've forgotten a lot of assembler, so the example may be inaccurate, but at least it worked.  Well, not exactly... because from time to time it would access invalid memory addresses and I'd have absolutely no idea why, and my computer would just freeze, or restart, or whatever computers used to do back in the DOS days.

So, how does one write super fast, stable, readable code?

Here are the steps:
  1. Find the problem areas.  There's no need to make any changes in code that is not slow.
  2. Measure the time taken.
  3. Write functionality tests.  If you're using a test driven approach, you should already have tests to ensure that any changes will not change the functionality.  If you don't, then write them.  
  4. Additionally, you could also write performance tests, like unit tests, which indicate a problem if an area of code is changed to be slower than it should be.
  5. Make your changes.
  6. Measure the performance.
1.  Finding the problem areas.

Areas that have the potentially to cause trouble are calls to slow hardware, like network calls and loading large files, calls to slow third party methods, and loops within loops.

If you've already written the code and you want to find out where the bottlenecks are, you can use a profiler.    I like ANTS Performance Profiler for .NET.  Start by fixing the biggest bottlenecks.

If you're writing the code and just want to make sure that it's efficient, then look out for the three potential problem areas that I mentioned earlier, calls to slow hardware, calls to slow third party methods and loops within loops.

On the subject of loops within loops, I want to make a quick comment about LINQ, .NET's slick way of working with lists.  The important thing to understand about LINQ is when it runs the code;  It only runs the code when you actually ask for the values:

For example, if you have something like this:

public IEnumerable Dogs
{
    get
    {
        return Animals.OfType();
    }
}

It would be very efficient to do something like this...

var dogs = Dogs;

...because the variable dogs will know how to return a list of dogs, but won't actually loop through the list of Animals until it's used.

However, this would be very inefficient...

for(int i = 0; i < Dogs; i++)
{
    Dogs.ElementAt(i).Bark(); // It will loop through the Animals list here
}

...because it will loop through the entire Animal list every time you call Dogs.ElementAt().

This is far better:

foreach(Dog dog in Dogs)
{
    dog.Bark();
}

The following, although not ideal, but useful if you want to work with i, will be almost as fast as the above code:

var dogs = Dogs.ToList(); // It will only loop through the Animals list here
for(int i = 0; i < dogs.Count; i++)
{
    dogs[i].Bark();
}

2.  Measure the time taken

You can time a specific piece of code in C# like this:

var watch = new Stopwatch();
 
watch.Start();
for (int i = 0; i < 1000000; i++) { }   // Execute the task to be timed
watch.Stop();

To cater for caching and JIT compilers, it's usually good to run your code once just before timing it.

3.  Write functionality tests

4.  Write performance tests

Just in case you want to ensure that neither yourself, nor anyone else changes anything to drastically increase the time taken then you could write a performance test.

In C#, the easiest way to test is probably using TimeoutAttribute, eg.

[Timeout(200)]

...but using a timer, like the Stopwatch class, will allow you to be more specific about exactly what you're testing and why you need it to be fast.  In fact, the Timeout attribute can be extremely unreliable, especially if the test runs before any of the other tests, so it's better to use a timer.

The acceptable time in this test should be a lot more than what it actually takes, because the speed of a test can change according to the computer it runs on and the other processes that are running simultaneously.

A more accurate way to do this would be to copy an exact duplicate of your high performance code into the test, and test that it isn't much slower.  It's not ideal, but I don't think there's a perfect way to do this.

For example:

public void MyTest()
{
    const int iterations = 10;
    const long buffer = 200 * iterations;
    var originalCodeTimer = new Stopwatch();
    var currentCodeTimer = new Stopwatch();
    for(var i = 0; i < iterations; i++)
    {
        originalCodeTimer.Start();
        MyOriginalCode();
        originalCodeTimer.Stop();

        currentCodeTimer.Start();
        target.Foo();
        currentCodeTimer.Stop();
    }
    if (currentCodeTimer.ElapsedMilliseconds > 
        originalCodeTimer.ElapsedMilliseconds + buffer)
    {
        Assert.Inconclusive();
    }
}

private void MyOriginalCode()
{
    // A complete copy of the code or a call to a copy of the code
    // that I originally wrote and timed
    // ...
}

In order to get more accurate timings it's usually best to run the method being tested a few times and get an average time.

Of course, since this isn't an exact measurement it should probably be treated differently to normal tests.  You may want to consider keeping these tests separate from others, so they are not run every time your unit tests and integration tests run.  Another way to do this might be to use Assert.Inconclusive() if you're using MSTest.  Instead of showing a red X or a green tick, this shows a yellow exclamation mark, and places the test result under Skipped Tests.  That's not exactly what it would mean, but it will attract attention and people can see quite easily why it was skipped if you make it clear in your test.

You could do something like this:

if(timeTaken > expectedDuration * 2) 
{
    if(timeTaken < expectedDuration * 4)
        Assert.Inconclusive("Took two to four times expected duration.");
    else
        Assert.Fail("Took four times expected duration.");
}

5.  Make your changes.

In most cases, what you'll be looking to do is this:

    Any slow code that can be called less often should be called less often.

You may be tempted to use multi-threading in an attempt to increase speed.  Multi-threading increases the complexity of your application and doesn't always make it faster.  Measure the performance and decide if the increased complexity is worth the performance gain.  If you decide to go with multi-threading, check that you've considered other alternatives, the target environment, and thread safety.

6.  Measure the performance.

And only keep your changes if your measurements confirm that it's faster and it passes your testing.


Images:

1. Courtesy of Jon Whiles at FreeDigitalPhotos.net

Wednesday 2 January 2013

The Characteristics of a Master of Programming

Today I was thinking about the characteristics that a master of programming has.  From experience I have found that some personality traits can be learned.

This is the list of characteristics that I came up with:

  • He loves programming.
  • He enjoys being creative.
  • He is organized.
  • He is calm.
  • He has lots of knowledge and experience.
  • He enjoys learning.
  • He is open minded, so he likes asking questions and getting his work peer reviewed, and does the best he can with the information he receives.
  • He loves challenges and is confident in his ability to solve any problem.
  • He is 100% focused on creating perfect software.  Nothing else matters.
  • He is always looking for the most effective way to write a piece of code or solve a problem.
What do you think?

How to reduce complexity and know everything.

Quality We have a great QA. His code is like a Rolex. You look at his gherkin, and smile.  It says what it's testing, in plain English, ...