Monday, December 2, 2013

Implicit Callbacks

While working in C# this morning, I found myself wanting to force the creator of an object to subscribe to an event. I had something like the following:

var obj = new MyObject();
obj.RequiredEvent += EventHandler;

My first solution was to extract a sort of factory method and use that whenever I wanted to construct the object:

private MyObject CreateObject()
{
  var obj = new MyObject();
  obj.RequiredEvent += EventHandler;

  return obj;
}

This is good, but it's still possible to construct the object without calling the helper method. What I really wanted was for the compiler to enforce that someone subscribe to the event, similar to how using constructor injection ensures that an object is passed all of its dependencies.

Callbacks

With this thought in mind, it occurred to me that if I really want to require subscribing to an event, maybe an event is not the right approach. What if the constructor simply took a callback function as a parameter, as follows?

var obj = new MyObject(EventHandler);

In effect, we have taken an implicit callback and made it explicit. Indeed, this is simpler and gives the compile-time check that I was after.

Pick Your Poison

As always, it's a trade-off. The callback approach limits me to one handler, whereas an event may have multiple subscribers. Furthermore, the callback approach forces the consumer to provide a handler, whereas with the event approach it is entirely optional.

Impact of Language

The language you're using may make one approach more natural than the other. In C#, I usually reach for events. In JavaScript, I'm more likely to use a callback handler. This is mainly due to the fact that C# has language support for events, while JavaScript does not. Passing around functions is also more idiomatic in JavaScript, so callbacks are a good fit.

In fact, were it not for my exposure to more functional languages like JavaScript, I probably wouldn't have thought of using the callback pattern at all. Yet another reason to learn more languages.

Saturday, October 19, 2013

SourceDiff

I wanted a new challenge, so I wrote a side-by-side diff tool.

My goal was to make a simple diff tool with a clean user interface. A hundred commits later, I've started using it in my own git workflow.

I decided to build it using web technologies, mostly because I haven't seen many good side-by-side diff tools in the browser. In the end, though, I also wanted a desktop version so that I could integrate with git and other source control tools. For this, I used Adobe Air to build a thin executable shell that accepts command-line arguments.

Here's a screen shot of the Air client:

It's not the most fancy diff tool out there, but again, I built it mostly for the experience. I used the well-known Longest Common Subsequence problem as the basis of my diff algorithm. First, I look for and ignore lines that are common to the beginning or end of each file. This is simply an optimization step that takes advantage of the fact that usually a large portion of the code is left unchanged. Then, I apply the LCS algorithm on a line-by-line basis. That is, initially I only find inserted or deleted lines. If I find an insert and delete on the same line, then I report it as a modified line and proceed to find the character differences for these lines. In other words, for each modified line, I apply the LCS algorithm again to find which characters changed. From there, I improve the character diffs by cleaning up what Neal Fraser calls Semantic Chaff.

Diff tools are hard to get right. If an edit consists mostly of small changes, diff tools as a rule do a great job. However, as soon as you start doing large refactors or moving things around, most tools report a mess of changes and are less than helpful. Some diff tools try to address this by being more language-aware. Some tools go so far as to parse the code and only report semantic changes at a higher level.

I like what these new tools are trying to accomplish, but at the end of the day, a text diff is usually sufficient and provides a simple, precise record of everything that changed.

Thursday, September 26, 2013

The Happiness Metric

Measuring the productivity of a software team is a challenging task, much to the dismay of many a manager. Still, managers often grasp for metrics, as if by tweaking variable X or Y their developers will suddenly become more productive.

Unfortunately, most metrics in software miss the mark. At best, measuring any sort of development activity causes developers to focus on improving that metric at the expense of other valuable activities. At worst, it causes developers—consciously or not—to game the system.

Let's consider the metric of code coverage. Suppose the department sets a policy that any new code that goes into the system must have 80% code coverage. The managers get their beloved data, and the quality of the code will certainly improve—right?

Now let's assume the developers make an honest effort to meet that goal. They may even go so far as to test all of their getters and setters.

Great, they met the goal, but is the quality of the code improved? Not necessarily. Most would agree that testing getters and setters is a waste of time. Further, just because code is covered by a test does not mean that the test is meaningful.

What about measuring the number of features shipped? This may work fine in the short term, but it definitely does not take into account code quality, or the value of said features to the customer.

The truth is, good developers thrive when granted autonomy. Setting arbitrary metrics for them to meet undermines whatever autonomy you claim to extend. It takes the "self" out of self-organized teams. It follows then that, ideally, metrics should grow within the team rather than be imposed by management.

Personally, the only metric I trust is a gut feeling. I call it the Happiness Metric.

To convert my gut feeling into hard data, I take a short survey every few weeks with prompts like the following, rated on a scale from 1 to 5:

I delivered value to my customers
I focused on quality
I worked at a sustainable pace
I was intellectually engaged
I was happy

For individuals and teams, these questions will cause introspection and serve as a discussion point. They will help you evaluate whether you worked on the most important thing for your customers.

It may not be exactly what management is looking for, but I don't know of any better way to measure the productivity of a team.

I'd love to hear about metrics that have worked for your teams. Follow me on twitter @justin_hewlett.

Thursday, August 8, 2013

CheckDoSomething()

I have gotten a lot of use out of what I might call the Check Execute Method pattern. The general idea is that, often times, you want to conditionally execute some action. For example, you may have something like the following appear in several different places:

if condition
  doSomething()
end

If you're not careful, code like this may get sprinkled all over the place. If many callers are checking the same condition, we can DRY up the code as follows:

def checkDoSomething(condition)
  if condition
    doSomething()
  end

  return condition
end

Now we have encapsulated the decision of whether to execute the method. We no longer need to repeat the check everywhere. The 'check' prefix in the method name makes it clear that the execution of the action is dependent on the parameters or state of the object. Additionally, if the caller needs to know whether or not the action was performed, they can use the boolean return value.

Some may disagree with returning a success value, claiming that this violates command query separation. Fine, but what is the alternative? We could rely on the caller to check some condition before calling the method, but this is problematic if we have more than a few callers who are all checking the same thing. A caller may even forget to make the check.

Alternatively, we could change the method to do the check internally and throw an exception if the condition is not met. But then we're using exceptions for control flow, and each caller would need to wrap the method call in a try-catch block.

There are trade-offs to be made, but I think this approach strikes a nice balance.

Tuesday, July 23, 2013

Syntax Highlighting

In my previous post about lexers, I mentioned in passing that a lexer could be used to implement a syntax highlighter. Well, I decided to do just that.

With the lexer already written, the rest was fairly straightforward — mostly just grabbing the snippet from the DOM and surrounding each token with the correct CSS class.

Here's some example HTML that demonstrates how to use it to highlight a snippet of Ruby:


<html>
<head>
    <title></title>
    <link rel="stylesheet" type="text/css" href="../src/style.css"/>
</head>
<body>

<pre class="syntax-marker-highlight ruby">
def test()
    strings = 'single' + "double"
    sym = :symbol
    num = 4

    return strings + sym.to_s + num.to_s  #stringify
end
</pre>

<script src="../lib/lexer.js"></script>
<script src="../src/syntaxMarker.js"></script>
<script src="../src/markers/rubyMarker.js"></script>
<script type="text/javascript">
    SyntaxMarker.mark();
</script>
</body>
</html>

This would render the following:

def test()
    strings = 'single' + "double"
    sym = :symbol
    num = 4

    return strings + sym.to_s + num.to_s  #stringify
end

SyntaxMarker is still a work in progress, but the core functionality is there. At this point, all that needs to be done is to add support for more languages. This is as simple as writing some regular expressions for language keywords, identifiers, etc.

Again, it was neat to see that most of the actual work is done by the lexer. From there, we can use the resulting tokens for many different purposes.

Wednesday, June 5, 2013

Tokenize This

As a man of my word, I have reinvented the wheel once again. I wrote a simple lexer in JavaScript.

A lexer (or scanner) is a tool in programming language design that is responsible for breaking an input string into individual tokens. These tokens can then be handed to a parser for further processing. For example, consider the following statement in a hypothetical language:

double := 2 * x //double the number

Here, our lexer might give back the following tokens:

ID (double), ASSIGN, NUMBER, TIMES, ID (x)

Note that the lexer discarded the whitespace and the comment. The rest of the input was matched to a predefined token.

Most lexers use regular expressions to specify the matching rules. Each of these rules, when matched, returns some specified token. Here is how it would look in Token.JS:

var lexer = new TokenJS.Lexer(
  'double := 2 * x   //double the number', {
  root: [
    [/\/\/.*/, TokenJS.Ignore],  //ignore comments
    [/\s+/, TokenJS.Ignore],  //ignore whitespace
    [/[a-zA-Z]+/, 'ID'],
    [/[0-9]+/, 'NUMBER'],
    [/:=/, 'ASSIGN'],
    [/\*/, 'TIMES']
  ]
});

var tokens = lexer.tokenize();

Even if you're not writing a compiler, lexing has many other uses. For starters, imagine you wanted to write a syntax highlighter for a language. This would be trivial with a lexer. You would simply specify a regular expression for identifiers, literals, reserved words, etc. in the given language. The 'token' in each case could be a color to highlight the match with.

In general, string processing is a big part of many applications, so it's good to know what tools you have at your disposal.

Friday, April 19, 2013

"Not Invented Here" Considered Helpful

We are constantly told not to "re-invent the wheel," as if doing so would be an utter waste of time. Yet doing so is extremely worthwhile, if only for the sake of experience alone. Maybe a better way to put it would be "dissect the wheel, learn about all of its intricacies, then rebuild it from scratch." Maybe someone has already done it. Maybe they've done it very well. But none of that matters. Who says your design won't be better? Who says it won't better apply to your needs? And what of the grand sense of accomplishment that you've done something by yourself?

In programming, Not Invented Here is spoken of as strictly a bad thing. "There's a library for that," they say. They're right—there probably is a library, but there's something critical missing in these discussions. Using a library does not necessarily give you understanding as to how it works.

In my undergraduate compilers course, we used a few libraries to help us parse our language. But before using a new library, the professor would ensure that we had a clear understanding of how it worked. Before using regular expressions, for example, he made sure we had a good background in finite automata and state machines. "Given enough time," he would ask, "could you have implemented this yourself?"

More and more, I'm convinced that we should have an intimate understanding of the libraries that we use. Where necessary, we should sculpt them, tear them apart—even rewrite them.

Now, when you're on the clock, your opportunities for exploration may be limited. But at a bare minimum, take the time to do so in your personal projects. After all, why else are you doing personal projects, if not to learn and have fun?

Monday, February 25, 2013

Cut the Ceremony

I'm happy to see a growing interest in languages that attempt to cut out the ceremony—things in a language, often keywords, that take your attention away from the code you actually care about. Often these are things that you type only to appease the compiler, or to make the language easier to parse (I'm looking at you, semicolon). To see what I mean, compare

public virtual void Method(IFactory arg)
{
   arg.DoStuff();
}

to Ruby's

def method arg
   arg.doStuff
end

In this case, Ruby does away with ceremony mainly through dynamic typing. This frees the programmer from having to specify types in method arguments or returns. The downside to this approach, of course, is that you won’t find out about type errors until run time. While this problem can be mitigated through discipline and good testing, larger projects are arguably better off with static type checking—that is, unless you really need the flexibility that dynamic languages provide.

In fact, there are languages that manage to reduce ceremony without sacrificing static checking. Scala, for example, is a compiled, statically typed language that almost looks like Ruby if you squint hard enough. Scala is able to reduce the noise by leveraging such things as type inference and implicit returns. Consider the following example seen here:

def doubleValue(number : Int) : Int = {
   return number * 2
}

By making the return implicit, it can be reduced to the following:

def doubleValue(number : Int) = {
   number * 2
}

Finally, we can eliminate the braces since the function fits on a single line:

def doubleValue(number : Int) = number * 2

Thus, languages like Scala are able to achieve the terseness of Ruby while maintaining the static type safety and performance of C#.

QED.

Wednesday, January 16, 2013

Tests or Clean Code: You Pick

If you had to choose between clean code and untidy code that is covered by tests, which would you pick?

The more that I do TDD, the more I have come to realize a minor flaw: TDD optimizes for test coverage, at the potential expense of untidy code. It's entirely possible with TDD to end up with code that passes all the tests, yet is not particularly readable. Now, if the "Refactor" step is performed properly, this can mitigate the problem. But even with refactoring, TDD's baby-steps approach to solving a problem does not always produce the most readable code. Too often, we are focused only on passing the current test and we lose sight of the overall design (assuming, of course, that we have a design).

The classic alternative is to do some initial design work, code, then cover the code with tests. But one of the main arguments for TDD is that writing tests after the fact often gets neglected by lazy programmers. Then who is to say that the same thing won't happen with TDD? That is, the "Refactor" step can fall by the wayside just as easily.

The question, then, is what are you optimizing for?

If the "Refactor" step is neglected, TDD can lead to working, yet untidy, code. Upfront design can lead to clean code, yet you haven't proved that it works if you fail to cover it with tests afterwards.

If it came down to it, I would probably pick covered, messy code. The next developer to work in the code would be less likely to break my functionality if there are tests in place, even if it initially may be less readable.

Ideally, of course, there would be no need to make such a compromise in the first place. What are your thoughts? How can we avoid this?