Thursday, June 28, 2012

Unit Testing and Assumptions


When we write a unit test, for better or worse, we're locking down a section of code. This is good when we want to make sure the logic doesn't change. This can be bad, however, when our tests make it difficult to make good changes. Maybe you need to fix a bug, or change the details of an algorithm.

The problem is, sometimes unit tests make too many assumptions about the code under test. We assume that an algorithm will be implemented in a certain way. Perhaps the method contains a lot of side effects.

In my mind, the best-case scenario involves feeding input to a function and getting something in return. I give you a sentence, and you capitalize every word for me. I don't care how you do it, just that the output is what I expect:

def test_upper(self):
  input = "this is a sentence."
  output = upper(input)
  self.assertEqual("This Is A Sentence.", output)


These tests are short, easy to write, and they make no assumptions about the underlying code. It turns out that the less side effects a method has, the easier it is to test.

Cosider the following code:

class box:
  def __init__(self, length, width):
    self.__length = length
    self.__width = width


  def compute_area(self):
    return self.__length * self.__width


The constructor just sets two private fields. This is difficult to test without either accessing the internals of the class or exposing the fields through getters. Even then, we would be testing implementation details that are subject to change. I would argue that we should test the constructor indirectly by testing compute_area as follows:


def test_compute_area(self):
  my_box = box(4, 5)
  assertEqual(20, my_box.compute_area())


What we're really interested in is not that two private fields get set in the constructor, but that the object can compute its area.

Saturday, June 23, 2012

My First Open Source Project

In my first post, I mentioned my first project that I worked on in Python, called BatchEdit. I recently decided to host it on GitHub in order to motivate myself to work on it some more as well as to generate interest in the project.

BatchEdit is a command-line batch photo editor. Basically, you give it an input and output folder and specifiy some adjustments to be done, such as resizing, sharpening, boosting contrast, adding a border, etc. Here is an example of what the command looks like to auto-rotate, increase contrast, convert to grayscale, resize to 720 pixels, sharpen, add a gray border of 5 pixels, and overlay a watermark:

python scripts\BatchEdit.zip --input C:\input --output C:\output
--autorotate --resize 720 --grayscale --contrast 1.2 --sharpen 1.3
--border 5,gray --watermark C:\logo_transparent.png

Here is what the resulting transformation looks like:


In my mind, the most common use case for this program is preparing photos to upload to the web. Often photographers want to auto-rotate, resize, sharpen, and apply some basic adjustments such as boosting the contrast or saturation before uploading. If you have many photos to edit, doing this manually in a program like photoshop is quite tedious.

Under the hood, the code takes advantage of the Python Imaging Library (PIL) for all image manipulation.

Currently there are a few shortcomings of the program. First, it requires that the end user have Python and PIL installed on his system. Second, it does not yet have a GUI on top of it. These limitations would probably keep all but the most expert users from using it.

Batch photo editors have certainly been done before, but I think mine is simpler than most. To see the source code, click here.

Thursday, June 21, 2012

Lean on the Compiler

I've been thinking a lot lately about dynamic typing  about how it supposedly sets you free from the bonds of the compiler and makes you 10x more productive. Some blog posts even go so far as to claim that dynamic typing will replace strong typing due to the increase of unit testing. The advocates rightly point out that just because a program compiles does not mean it is correct. Unit tests are therefore a better safety net. Now, granted the code has full coverage, I agree that we can probably do without compile-time checking.

But is а fully-covered code base a reality? Not likely. Especially when you're working with legacy code.

Today at work I was doing quite a bit of signature changes to my methods  adding and removing parameters to several methods in many different classes. In situations like this, I'm glad to have the compiler to make sure I don't miss anything, even before I have a chance to run what tests I do have. "Lean on the compiler" is the phrase Michael Feathers uses in Working Effectively with Legacy Code. Great read, by the way.

Today I realized that this is one technique that I really miss in dynamically typed languages. Rather than holding me back, in this case the compile-time checking provided confidence to make changes quickly.

Saturday, June 9, 2012

When TDD Breaks Down

There's been a lot of hype recently about Test-Driven Development (TDD). I really like it for the most part. It forces the programmer to think about how the code will be used. This will often result in a more easy-to-use API. It's also a good way to explore all of the edge cases and make sure they are covered with a unit test.

I've found, though, that sometimes having to write the test before the code is a little too restrictive. Sometimes you have no idea what the resulting code will look like or how exactly it will achieve your goal. In this case, I am an advocate of what I like to call exploratory coding. Code first until you have an idea of what your code needs to do. This may be accompanied by some ad-hoc testing. Then, when you have the functionality nailed down, cover it with tests. Of course, this only works if you're careful to keep your code modular and testable. If not, you may need to refactor first or rewrite the code altogether following TDD.

We often joke at work that the proper way to do TDD is to write the code first and comment it out, then get your failing test and fix it by uncommenting the code. This just goes to show how counter-intuitive test-first can be at times.

I often hear people argue that if you write the code first, there's a chance you'll get lazy and won't get around to testing it afterwards. I disagree. This just requires a little discipline, which I would argue is essential to being a good developer in the first place.

Essentially, the principle behind TDD is the importance of modular, well-tested code. Indeed, I would say that the rise of unit testing is one of the biggest developments to improve the overall quality of code in recent times.

Sunday, June 3, 2012

The Virtues of Dynamic Typing

Last semester, I took a class in Python. Coming from a C++/C# background, many things were new to me. I liked not having to use curly braces. I also liked not having to declare the type of my variables.

A few things caught me off guard, however. I found it strange that Python did not allow me to declare my class variables as private. I was also disappointed that I couldn't explicitly define an interface.

What I initially considered limitations of the language, I eventually regarded as liberating. I felt empowered to be a responsible programmer without being babysat by the compiler. I knew what types I was using, and I knew which variables I intended to be private. When I wanted to use an interface, I simply used the same method signature in several different classes. Indeed, my productivity was increased significantly as a result of this flexibility.

At the end of the class we had a chance to build a final project. In just three weeks, I was able to build a command-line batch photo editor. Working on it was a joy, and the resulting code was very clean and terse.

My only frustration that did not have a satisfactory conclusion was the problem of deployment. Either the end user needs to have Python installed on his system, or you must somehow package up the interpreter with the application. Neither solution is ideal.

Overall, I concluded that in many cases, static typing is an unnecessary burden to programmers. Dynamic typing is not without its problems, but Python will definitely hold a prominent place in my quiver from now on.