Bit Builder

Friday, April 19, 2013

"Not Invented Here" Considered Helpful

We are constantly told not to "re-invent the wheel," as if doing so would be an utter waste of time. Yet doing so is extremely worthwhile, if only for the sake of experience alone. Maybe a better way to put it would be "dissect the wheel, learn about all of its intricacies, then rebuild it from scratch." Maybe someone has already done it. Maybe they've done it very well. But none of that matters. Who says your design won't be better? Who says it won't better apply to your needs? And what of the grand sense of accomplishment that you've done something by yourself?

In programming, Not Invented Here is spoken of as strictly a bad thing. "There's a library for that," they say. They're right—there probably is a library, but there's something critical missing in these discussions. Using a library does not necessarily give you understanding as to how it works.

In my undergraduate compilers course, we used a few libraries to help us parse our language. But before using a new library, the professor would ensure that we had a clear understanding of how it worked. Before using regular expressions, for example, he made sure we had a good background in finite automata and state machines. "Given enough time," he would ask, "could you have implemented this yourself?"

More and more, I'm convinced that we should have an intimate understanding of the libraries that we use. Where necessary, we should sculpt them, tear them apart—even rewrite them.

Now, when you're on the clock, your opportunities for exploration may be limited. But at a bare minimum, take the time to do so in your personal projects. After all, why else are you doing personal projects, if not to learn and have fun?

Monday, February 25, 2013

Cut the Ceremony

I'm happy to see a growing interest in languages that attempt to cut out the ceremony—things in a language, often keywords, that take your attention away from the code you actually care about. Often these are things that you type only to appease the compiler, or to make the language easier to parse (I'm looking at you, semicolon). To see what I mean, compare

public virtual void Method(IFactory arg)
{
   arg.DoStuff();
}

to Ruby's

def method arg
   arg.doStuff
end

In this case, Ruby does away with ceremony mainly through dynamic typing. This frees the programmer from having to specify types in method arguments or returns. The downside to this approach, of course, is that you won’t find out about type errors until run time. While this problem can be mitigated through discipline and good testing, larger projects are arguably better off with static type checking—that is, unless you really need the flexibility that dynamic languages provide.

In fact, there are languages that manage to reduce ceremony without sacrificing static checking. Scala, for example, is a compiled, statically typed language that almost looks like Ruby if you squint hard enough. Scala is able to reduce the noise by leveraging such things as type inference and implicit returns. Consider the following example seen here:

def doubleValue(number : Int) : Int = {
   return number * 2
}

By making the return implicit, it can be reduced to the following:

def doubleValue(number : Int) = {
   number * 2
}

Finally, we can eliminate the braces since the function fits on a single line:

def doubleValue(number : Int) = number * 2

Thus, languages like Scala are able to achieve the terseness of Ruby while maintaining the static type safety and performance of C#.

QED.

Wednesday, January 16, 2013

Tests or Clean Code: You Pick

If you had to choose between clean code and untidy code that is covered by tests, which would you pick?

The more that I do TDD, the more I have come to realize a minor flaw: TDD optimizes for test coverage, at the potential expense of untidy code. It's entirely possible with TDD to end up with code that passes all the tests, yet is not particularly readable. Now, if the "Refactor" step is performed properly, this can mitigate the problem. But even with refactoring, TDD's baby-steps approach to solving a problem does not always produce the most readable code. Too often, we are focused only on passing the current test and we lose sight of the overall design (assuming, of course, that we have a design).

The classic alternative is to do some initial design work, code, then cover the code with tests. But one of the main arguments for TDD is that writing tests after the fact often gets neglected by lazy programmers. Then who is to say that the same thing won't happen with TDD? That is, the "Refactor" step can fall by the wayside just as easily.

The question, then, is what are you optimizing for?

If the "Refactor" step is neglected, TDD can lead to working, yet untidy, code. Upfront design can lead to clean code, yet you haven't proved that it works if you fail to cover it with tests afterwards.

If it came down to it, I would probably pick covered, messy code. The next developer to work in the code would be less likely to break my functionality if there are tests in place, even if it initially may be less readable.

Ideally, of course, there would be no need to make such a compromise in the first place. What are your thoughts? How can we avoid this?

Sunday, December 23, 2012

Basement Hackers and Developer Elitism

I have read a lot of blogs complaining about the state of affairs among developers. Jeff Atwood asks, "Why Can't Programmers.. Program?" Jason Gorman goes so far as to say:

Consider that not all developers are equal, and some developers achieve more than others. In reality, 80% of the working code in operation today can probably be attributed to small proportion of us. The rest just get in the way. If anything, if we thinned down the herd to just the stronger programmers, more might get done.

Certainly, skill level varies among developers. But to make ridiculous claims like this is nothing more than a case of developer elitism and almost sounds like some sort of ethnic cleansing campaign.

What's really going on is that, due to the advent of affordable PCs and the internet, it's possible for someone who has never set foot in a university to learn to code from online tutorials. Тhe barrier to entry is lower than ever before. This is not necessarily a bad thing. But I can see why it would cause the older folks to become bitter.

The same thing happened with photography. It used to be that in order to become a photographer, you needed access to a dark room. You needed to know about the different chemicals. You needed to work for some time as an apprentice.

When digital photography and photo editing came along, the barrier to entry became significantly lower. And, of course, some seasoned professionals are now bitter that legions are becoming photographers without ever needing to enter a darkroom.

A traditional education in computer science is extremely helpful. It's important to have a foundation in algorithms and data structures. But there are plenty of people who are successful developers that have obtained their education through different means.

Instead of pointing fingers at newcomers and attempting to "thin down the herd" as Jason puts it, we should welcome to the fold those who truly have a desire to become great programmers.

Now, were I in a position to hire developers, I would definitely test their ability to code. This is one case where it's appropriate to evaluate skills and make a judgement call. But to make blanket statements like those above is just plain ignorant.

Monday, December 3, 2012

Code Ownership

In economics, the Tragedy of the Commons tells of a proverbial pasture that is common to several herders. The observation goes as follows: "...it is in each herder's interest to put the next (and succeeding) cows he acquires onto the land, even if the quality of the common is damaged for all as a result, through overgrazing. The herder receives all of the benefits from an additional cow, while the damage to the common is shared by the entire group. If all herders make this individually rational economic decision, the common will be depleted or even destroyed, to the detriment of all."

I often wonder how this principle applies to a software team. Imagine that you are adding a feature that requires you to add some methods to an existing class that is already pretty big. The simplest thing to do is to just add the methods and move on. It's a tempting choice because it provides immediate results to you, and the costs won't be borne until a later date. At that time they'll be shared across the whole team and not borne by you alone. It takes a disciplined programmer to take the additional steps to pull out the methods into a new class in hopes of reducing future team maintenance.

How do we solve this dilemma? In the case of the pasture, the economist would say to give a private pasture to each herder that he alone is responsible for. Can this same solution be applied to our software teams?

I think a certain sense of code ownership is beneficial. This allows a developer to specialize in a certain part of the system. She will be more likely to keep the code clean if she knows she'll be working in it again in the future. She'll be familiar with the code whenever a feature needs to be added there. The problem with this, though, is that developers typically have big egos. If you are given "ownership" over a section of code, you will be less open to other developers making improvements to your algorithms or design. It discourages team collaboration and dissemination of knowledge.

At the other extreme, you may get exposure to more parts of the system, but you would likely not have any personal investment in any specific section of the code.

I think the solution is some combination of the two. When working on a particular feature or bug fix, focus your efforts on getting to know that area of code. Focus your clean up efforts in that area, knowing that you (or a team member) are likely to encounter the code again in the future. When you are finished, you can move on to another section of code and repeat the process. Over time, each member of the team will ideally have some sense of ownership in each module of the system.

Here are some practices that I think provide the appropriate balance of ownership:

Small Teams

Ideally, teams should consist of 3-4 developers working on a particular, well-defined feature. This means that each developer has a significant investment in the feature, sharing at least a quarter of the responsibility for the quality of the code. Because the team is often working in the same section of code for the duration of the feature, there is a greater incentive to keep the code clean.

Code Reviews

Code reviews, whether done continually through pairing or just before check-in, serve as an extra opportunity to consider the impact of your changes on the code base, including how it will affect future maintenance by the team as a whole.

Ownership of Bugs

Where reasonable, any bugs that come up during development ought to be addressed by the individual that caused them. Attitude is important here. This is not merely an exercise of pointing blame and public humiliation. Rather, that developer is often more familiar with the section of code. This also helps developers to have ownership of the quality of the code that they produce.

The key here is that successful systems are written by developers who take personal responsibility for their code, yet at the same time realize they are part of a team. With the understanding that everyone likes to work in a clean code base, the team, and each individual, must commit to practices that produce quality, maintainable code.

Monday, October 29, 2012

Don't Parse with RegEx

At some point, I think all developers have tried to use regular expressions to parse input. It may work at first, but it quickly becomes unwieldy for all but the most trivial of inputs.

Engineers learned early on that it simplifies things drastically to separate the compiler into separate modules – first find the individual tokens, then use a grammar to see if the tokens are arranged correctly, then assign semantic meaning to the statements, and so forth.

Regular expressions are great for getting the individual tokens. This is the scanning, or lexing phase. With the tokens in hand, we're ready to do the actual parsing. To do this properly, we need to specify a formal grammar. This is typically done in BNF. Grammars can deal with nesting and other constructs that regular expressions don't handle well. For example, imagine trying to match up opening and closing sets of parentheses using regular expressions. This is trivial to specify in a grammar.

Separation of concerns is another advantage of doing it this way. If we truly follow the single responsibility principle, we have no justification for trying to lex and parse at the same time.

Regular expressions are powerful tools by themselves, but don't abuse them – especially if you're dealing with a full-fledged domain-specific language.

Sunday, September 16, 2012

State vs. Interaction Testing

In my post Unit Testing and Assumptions, I described my ideal type of test: "I don't care how [the algorithm does] it, just that the output is what I expect. These tests are short, easy to write, and they make no assumptions about the underlying code." What I didn't realize at the time is that I was simply making the case for state-based testing.

State-based testing focuses on the results of a computation, not on the specific steps of an algorithm. Instead of verifying that add(2, 2) was called on a mock, for example, we simply assert that the result is 4. This makes the tests less brittle and usually easier to write because of less setup. It's also better from a TDD perspective since you don't need to know the details of the code-to-be-implemented in order to write your failing test. For better or worse, these type of tests can quickly turn into "mini-integration" tests since they lack the test isolation that mocks provide.

State-based testing stands in contrast to interaction or behavioral testing. Here the focus is, well, on the interaction of the system under test with a mock object. This can be useful when the method delegates some work to a collaborator. The collaborator object is already tested, so we just need to verify that the delegation takes place with the correct parameters. Indeed, this is essentially the only way to unit test code that makes calls to a web service or database. The downside, of course, is that the test locks down the behavior of the code. If that behavior ever changes, even if the results stay the same, we must update the test accordingly.

I tend to be a proponent of state-based testing whenever possible. I use it in cases where my class under test has no collaborators, or when the collaborators are lightweight. I like that this type of testing encourages return values over side-effects. I'm also usually more concerned that the results themselves are correct than how the code computed them.

If I have a good reason to isolate the class completely, then I use mocks to verify behavior. This is usually on the seams of the system, such as near the data access layer.

Ultimately, I end up doing whatever feels right at the time given the situation.