Bit Builder

Styles in Javascript

2015-01-18T22:51:00.000-08:00

Chris Chedeau gave an excellent presentation about using Javascript for styling.

Inspired by Chris' talk, I decided to try it out on one of my side projects, the game of Reversi (Othello) implemented in React.js (demo). React is a good candidate to use Javascript for your styles because it converts Javascript objects to CSS:

render: function() {
    var styles = {
        textAlign: 'center',
        [...]
    };

    return <p style={styles}>{this.props.message}</p>;
}

Let's take a look at what we can do. Out of the box, we have mixins in the form of Javascript functions:

var Player = require('../lib/Player');

function getBackgroundImage(player) {
    if (player === Player.One) return 'url("img/red.png")';
    if (player === Player.Two) return 'url("img/blue.png")';

    return 'none';
}

module.exports = function(player) {
    return {
        backgroundImage: getBackgroundImage(player),
        [...]
    };
};

We can extend styles with this mixin by merging Javascript objects:

var Player = require('../lib/Player');
var extend = require('object-assign');
var cellStyle = require('../styles/cell');

function buildStyles(owner, playerHint) {
    var cellAppearance = (owner !== Player.None)
        ? owner
        : playerHint;

    return extend({
        border: '1px solid black'
    }, cellStyle(cellAppearance));
}

And, of course, we have variables and constants:

module.exports = {
    fontSize: 24
};

Hopefully the advantages to this approach over something like Sass are clear. Since it's Just Javascript, you don't need to learn another language to get the features of a CSS preprocessor.

While I'm not completely convinced, this is a compelling approach to styling your applications, especially if you're already using React.

Timeboxing

2014-03-09T21:40:00.001-07:00

Most people in the agile world agree that it's good to timebox meetings. Interestingly, many in this group don't like the idea of having a time-boxed sprint. Instead, they prefer some sort of continuous delivery, such as Kanban.

So how can we explain this divide? The argument for timeboxing meetings is pretty simple: it helps keep the meeting focused and prevents it from lasting too long. Wouldn't this be good for the development process as well? Why do we see the value in timeboxing meetings, but not the development process?

I wonder if this group of people just have a sour taste in their mouths from their experience with scrum. Sprints in scrum often bring back horrible memories of day-long planning meetings and terrible estimates.

I don't want to go back to that, either, but couldn't our development work benefit from some sort of soft date or at least an overriding goal to work towards? This does not need to be a firm commitment, it simply helps in setting priorities. If we explicitly state that our goal for the next few weeks is to add feature X, we'll set aside other tasks that don't help with the stated goal.

Let me be clear — this does not set anything in stone. It is much more lightweight than a scrum sprint and does not carry the same level of commitment or predictability. It's also important to note that this would not rule out continuous delivery — code could still be released in the middle of the time box as soon as a logical chunk of work was finished.

Just as timeboxing meetings helps us make sure we discuss the most important topics, doing the same thing with our development process helps the highest priority items surface to the top and ensures the team is on the same page.

Implicit Callbacks

2013-12-02T22:40:00.001-08:00

While working in C# this morning, I found myself wanting to force the creator of an object to subscribe to an event. I had something like the following:

var obj = new MyObject();
obj.RequiredEvent += EventHandler;

My first solution was to extract a sort of factory method and use that whenever I wanted to construct the object:

private MyObject CreateObject()
{
  var obj = new MyObject();
  obj.RequiredEvent += EventHandler;

  return obj;
}

This is good, but it's still possible to construct the object without calling the helper method. What I really wanted was for the compiler to enforce that someone subscribe to the event, similar to how using constructor injection ensures that an object is passed all of its dependencies.

Callbacks

With this thought in mind, it occurred to me that if I really want to require subscribing to an event, maybe an event is not the right approach. What if the constructor simply took a callback function as a parameter, as follows?

var obj = new MyObject(EventHandler);

In effect, we have taken an implicit callback and made it explicit. Indeed, this is simpler and gives the compile-time check that I was after.

Pick Your Poison

As always, it's a trade-off. The callback approach limits me to one handler, whereas an event may have multiple subscribers. Furthermore, the callback approach forces the consumer to provide a handler, whereas with the event approach it is entirely optional.

Impact of Language

The language you're using may make one approach more natural than the other. In C#, I usually reach for events. In JavaScript, I'm more likely to use a callback handler. This is mainly due to the fact that C# has language support for events, while JavaScript does not. Passing around functions is also more idiomatic in JavaScript, so callbacks are a good fit.

In fact, were it not for my exposure to more functional languages like JavaScript, I probably wouldn't have thought of using the callback pattern at all. Yet another reason to learn more languages.

SourceDiff

2013-10-19T22:47:00.001-07:00

I wanted a new challenge, so I wrote a side-by-side diff tool.

My goal was to make a simple diff tool with a clean user interface. A hundred commits later, I've started using it in my own git workflow.

I decided to build it using web technologies, mostly because I haven't seen many good side-by-side diff tools in the browser. In the end, though, I also wanted a desktop version so that I could integrate with git and other source control tools. For this, I used Adobe Air to build a thin executable shell that accepts command-line arguments.

Here's a screen shot of the Air client:

It's not the most fancy diff tool out there, but again, I built it mostly for the experience. I used the well-known Longest Common Subsequence problem as the basis of my diff algorithm. First, I look for and ignore lines that are common to the beginning or end of each file. This is simply an optimization step that takes advantage of the fact that usually a large portion of the code is left unchanged. Then, I apply the LCS algorithm on a line-by-line basis. That is, initially I only find inserted or deleted lines. If I find an insert and delete on the same line, then I report it as a modified line and proceed to find the character differences for these lines. In other words, for each modified line, I apply the LCS algorithm again to find which characters changed. From there, I improve the character diffs by cleaning up what Neal Fraser calls Semantic Chaff.

Diff tools are hard to get right. If an edit consists mostly of small changes, diff tools as a rule do a great job. However, as soon as you start doing large refactors or moving things around, most tools report a mess of changes and are less than helpful. Some diff tools try to address this by being more language-aware. Some tools go so far as to parse the code and only report semantic changes at a higher level.

I like what these new tools are trying to accomplish, but at the end of the day, a text diff is usually sufficient and provides a simple, precise record of everything that changed.

The Happiness Metric

2013-09-26T21:53:00.000-07:00

Measuring the productivity of a software team is a challenging task, much to the dismay of many a manager. Still, managers often grasp for metrics, as if by tweaking variable X or Y their developers will suddenly become more productive.

Unfortunately, most metrics in software miss the mark. At best, measuring any sort of development activity causes developers to focus on improving that metric at the expense of other valuable activities. At worst, it causes developers—consciously or not—to game the system.

Let's consider the metric of code coverage. Suppose the department sets a policy that any new code that goes into the system must have 80% code coverage. The managers get their beloved data, and the quality of the code will certainly improve—right?

Now let's assume the developers make an honest effort to meet that goal. They may even go so far as to test all of their getters and setters.

Great, they met the goal, but is the quality of the code improved? Not necessarily. Most would agree that testing getters and setters is a waste of time. Further, just because code is covered by a test does not mean that the test is meaningful.

What about measuring the number of features shipped? This may work fine in the short term, but it definitely does not take into account code quality, or the value of said features to the customer.

The truth is, good developers thrive when granted autonomy. Setting arbitrary metrics for them to meet undermines whatever autonomy you claim to extend. It takes the "self" out of self-organized teams. It follows then that, ideally, metrics should grow within the team rather than be imposed by management.

Personally, the only metric I trust is a gut feeling. I call it the Happiness Metric.

To convert my gut feeling into hard data, I take a short survey every few weeks with prompts like the following, rated on a scale from 1 to 5:

I delivered value to my customers
I focused on quality
I worked at a sustainable pace
I was intellectually engaged
I was happy

For individuals and teams, these questions will cause introspection and serve as a discussion point. They will help you evaluate whether you worked on the most important thing for your customers.

It may not be exactly what management is looking for, but I don't know of any better way to measure the productivity of a team.

I'd love to hear about metrics that have worked for your teams. Follow me on twitter @justin_hewlett.

CheckDoSomething()

2013-08-08T22:47:00.000-07:00

I have gotten a lot of use out of what I might call the Check Execute Method pattern. The general idea is that, often times, you want to conditionally execute some action. For example, you may have something like the following appear in several different places:

if condition
  doSomething()
end

If you're not careful, code like this may get sprinkled all over the place. If many callers are checking the same condition, we can DRY up the code as follows:

def checkDoSomething(condition)
  if condition
    doSomething()
  end

  return condition
end

Now we have encapsulated the decision of whether to execute the method. We no longer need to repeat the check everywhere. The 'check' prefix in the method name makes it clear that the execution of the action is dependent on the parameters or state of the object. Additionally, if the caller needs to know whether or not the action was performed, they can use the boolean return value.

Some may disagree with returning a success value, claiming that this violates command query separation. Fine, but what is the alternative? We could rely on the caller to check some condition before calling the method, but this is problematic if we have more than a few callers who are all checking the same thing. A caller may even forget to make the check.

Alternatively, we could change the method to do the check internally and throw an exception if the condition is not met. But then we're using exceptions for control flow, and each caller would need to wrap the method call in a try-catch block.

There are trade-offs to be made, but I think this approach strikes a nice balance.

Syntax Highlighting

2013-07-23T22:35:00.003-07:00

In my previous post about lexers, I mentioned in passing that a lexer could be used to implement a syntax highlighter. Well, I decided to do just that.

With the lexer already written, the rest was fairly straightforward — mostly just grabbing the snippet from the DOM and surrounding each token with the correct CSS class.

Here's some example HTML that demonstrates how to use it to highlight a snippet of Ruby:


<html>
<head>
    <title></title>
    <link rel="stylesheet" type="text/css" href="../src/style.css"/>
</head>
<body>

<pre class="syntax-marker-highlight ruby">
def test()
    strings = 'single' + "double"
    sym = :symbol
    num = 4

    return strings + sym.to_s + num.to_s  #stringify
end
</pre>

<script src="../lib/lexer.js"></script>
<script src="../src/syntaxMarker.js"></script>
<script src="../src/markers/rubyMarker.js"></script>
<script type="text/javascript">
    SyntaxMarker.mark();
</script>
</body>
</html>

This would render the following:

def test()
    strings = 'single' + "double"
    sym = :symbol
    num = 4

    return strings + sym.to_s + num.to_s  #stringify
end

SyntaxMarker is still a work in progress, but the core functionality is there. At this point, all that needs to be done is to add support for more languages. This is as simple as writing some regular expressions for language keywords, identifiers, etc.

Again, it was neat to see that most of the actual work is done by the lexer. From there, we can use the resulting tokens for many different purposes.

Tokenize This

2013-06-05T20:31:00.001-07:00

As a man of my word, I have reinvented the wheel once again. I wrote a simple lexer in JavaScript.

A lexer (or scanner) is a tool in programming language design that is responsible for breaking an input string into individual tokens. These tokens can then be handed to a parser for further processing. For example, consider the following statement in a hypothetical language:

double := 2 * x //double the number

Here, our lexer might give back the following tokens:

ID (double), ASSIGN, NUMBER, TIMES, ID (x)

Note that the lexer discarded the whitespace and the comment. The rest of the input was matched to a predefined token.

Most lexers use regular expressions to specify the matching rules. Each of these rules, when matched, returns some specified token. Here is how it would look in Token.JS:

var lexer = new TokenJS.Lexer(
  'double := 2 * x   //double the number', {
  root: [
    [/\/\/.*/, TokenJS.Ignore],  //ignore comments
    [/\s+/, TokenJS.Ignore],  //ignore whitespace
    [/[a-zA-Z]+/, 'ID'],
    [/[0-9]+/, 'NUMBER'],
    [/:=/, 'ASSIGN'],
    [/\*/, 'TIMES']
  ]
});

var tokens = lexer.tokenize();

Even if you're not writing a compiler, lexing has many other uses. For starters, imagine you wanted to write a syntax highlighter for a language. This would be trivial with a lexer. You would simply specify a regular expression for identifiers, literals, reserved words, etc. in the given language. The 'token' in each case could be a color to highlight the match with.

In general, string processing is a big part of many applications, so it's good to know what tools you have at your disposal.

"Not Invented Here" Considered Helpful

2013-04-19T21:37:00.001-07:00

We are constantly told not to "re-invent the wheel," as if doing so would be an utter waste of time. Yet doing so is extremely worthwhile, if only for the sake of experience alone. Maybe a better way to put it would be "dissect the wheel, learn about all of its intricacies, then rebuild it from scratch." Maybe someone has already done it. Maybe they've done it very well. But none of that matters. Who says your design won't be better? Who says it won't better apply to your needs? And what of the grand sense of accomplishment that you've done something by yourself?

In programming, Not Invented Here is spoken of as strictly a bad thing. "There's a library for that," they say. They're right—there probably is a library, but there's something critical missing in these discussions. Using a library does not necessarily give you understanding as to how it works.

In my undergraduate compilers course, we used a few libraries to help us parse our language. But before using a new library, the professor would ensure that we had a clear understanding of how it worked. Before using regular expressions, for example, he made sure we had a good background in finite automata and state machines. "Given enough time," he would ask, "could you have implemented this yourself?"

More and more, I'm convinced that we should have an intimate understanding of the libraries that we use. Where necessary, we should sculpt them, tear them apart—even rewrite them.

Now, when you're on the clock, your opportunities for exploration may be limited. But at a bare minimum, take the time to do so in your personal projects. After all, why else are you doing personal projects, if not to learn and have fun?

Cut the Ceremony

2013-02-25T18:47:00.002-08:00

I'm happy to see a growing interest in languages that attempt to cut out the ceremony—things in a language, often keywords, that take your attention away from the code you actually care about. Often these are things that you type only to appease the compiler, or to make the language easier to parse (I'm looking at you, semicolon). To see what I mean, compare

public virtual void Method(IFactory arg)
{
   arg.DoStuff();
}

to Ruby's

def method arg
   arg.doStuff
end

In this case, Ruby does away with ceremony mainly through dynamic typing. This frees the programmer from having to specify types in method arguments or returns. The downside to this approach, of course, is that you won’t find out about type errors until run time. While this problem can be mitigated through discipline and good testing, larger projects are arguably better off with static type checking—that is, unless you really need the flexibility that dynamic languages provide.

In fact, there are languages that manage to reduce ceremony without sacrificing static checking. Scala, for example, is a compiled, statically typed language that almost looks like Ruby if you squint hard enough. Scala is able to reduce the noise by leveraging such things as type inference and implicit returns. Consider the following example seen here:

def doubleValue(number : Int) : Int = {
   return number * 2
}

By making the return implicit, it can be reduced to the following:

def doubleValue(number : Int) = {
   number * 2
}

Finally, we can eliminate the braces since the function fits on a single line:

def doubleValue(number : Int) = number * 2

Thus, languages like Scala are able to achieve the terseness of Ruby while maintaining the static type safety and performance of C#.

QED.

Tests or Clean Code: You Pick

2013-01-16T22:24:00.001-08:00

If you had to choose between clean code and untidy code that is covered by tests, which would you pick?

The more that I do TDD, the more I have come to realize a minor flaw: TDD optimizes for test coverage, at the potential expense of untidy code. It's entirely possible with TDD to end up with code that passes all the tests, yet is not particularly readable. Now, if the "Refactor" step is performed properly, this can mitigate the problem. But even with refactoring, TDD's baby-steps approach to solving a problem does not always produce the most readable code. Too often, we are focused only on passing the current test and we lose sight of the overall design (assuming, of course, that we have a design).

The classic alternative is to do some initial design work, code, then cover the code with tests. But one of the main arguments for TDD is that writing tests after the fact often gets neglected by lazy programmers. Then who is to say that the same thing won't happen with TDD? That is, the "Refactor" step can fall by the wayside just as easily.

The question, then, is what are you optimizing for?

If the "Refactor" step is neglected, TDD can lead to working, yet untidy, code. Upfront design can lead to clean code, yet you haven't proved that it works if you fail to cover it with tests afterwards.

If it came down to it, I would probably pick covered, messy code. The next developer to work in the code would be less likely to break my functionality if there are tests in place, even if it initially may be less readable.

Ideally, of course, there would be no need to make such a compromise in the first place. What are your thoughts? How can we avoid this?

Basement Hackers and Developer Elitism

2012-12-23T22:10:00.001-08:00

I have read a lot of blogs complaining about the state of affairs among developers. Jeff Atwood asks, "Why Can't Programmers.. Program?" Jason Gorman goes so far as to say:

Consider that not all developers are equal, and some developers achieve more than others. In reality, 80% of the working code in operation today can probably be attributed to small proportion of us. The rest just get in the way. If anything, if we thinned down the herd to just the stronger programmers, more might get done.

Certainly, skill level varies among developers. But to make ridiculous claims like this is nothing more than a case of developer elitism and almost sounds like some sort of ethnic cleansing campaign.

What's really going on is that, due to the advent of affordable PCs and the internet, it's possible for someone who has never set foot in a university to learn to code from online tutorials. Тhe barrier to entry is lower than ever before. This is not necessarily a bad thing. But I can see why it would cause the older folks to become bitter.

The same thing happened with photography. It used to be that in order to become a photographer, you needed access to a dark room. You needed to know about the different chemicals. You needed to work for some time as an apprentice.

When digital photography and photo editing came along, the barrier to entry became significantly lower. And, of course, some seasoned professionals are now bitter that legions are becoming photographers without ever needing to enter a darkroom.

A traditional education in computer science is extremely helpful. It's important to have a foundation in algorithms and data structures. But there are plenty of people who are successful developers that have obtained their education through different means.

Instead of pointing fingers at newcomers and attempting to "thin down the herd" as Jason puts it, we should welcome to the fold those who truly have a desire to become great programmers.

Now, were I in a position to hire developers, I would definitely test their ability to code. This is one case where it's appropriate to evaluate skills and make a judgement call. But to make blanket statements like those above is just plain ignorant.

Code Ownership

2012-12-03T22:31:00.002-08:00

In economics, the Tragedy of the Commons tells of a proverbial pasture that is common to several herders. The observation goes as follows: "...it is in each herder's interest to put the next (and succeeding) cows he acquires onto the land, even if the quality of the common is damaged for all as a result, through overgrazing. The herder receives all of the benefits from an additional cow, while the damage to the common is shared by the entire group. If all herders make this individually rational economic decision, the common will be depleted or even destroyed, to the detriment of all."

I often wonder how this principle applies to a software team. Imagine that you are adding a feature that requires you to add some methods to an existing class that is already pretty big. The simplest thing to do is to just add the methods and move on. It's a tempting choice because it provides immediate results to you, and the costs won't be borne until a later date. At that time they'll be shared across the whole team and not borne by you alone. It takes a disciplined programmer to take the additional steps to pull out the methods into a new class in hopes of reducing future team maintenance.

How do we solve this dilemma? In the case of the pasture, the economist would say to give a private pasture to each herder that he alone is responsible for. Can this same solution be applied to our software teams?

I think a certain sense of code ownership is beneficial. This allows a developer to specialize in a certain part of the system. She will be more likely to keep the code clean if she knows she'll be working in it again in the future. She'll be familiar with the code whenever a feature needs to be added there. The problem with this, though, is that developers typically have big egos. If you are given "ownership" over a section of code, you will be less open to other developers making improvements to your algorithms or design. It discourages team collaboration and dissemination of knowledge.

At the other extreme, you may get exposure to more parts of the system, but you would likely not have any personal investment in any specific section of the code.

I think the solution is some combination of the two. When working on a particular feature or bug fix, focus your efforts on getting to know that area of code. Focus your clean up efforts in that area, knowing that you (or a team member) are likely to encounter the code again in the future. When you are finished, you can move on to another section of code and repeat the process. Over time, each member of the team will ideally have some sense of ownership in each module of the system.

Here are some practices that I think provide the appropriate balance of ownership:

Small Teams

Ideally, teams should consist of 3-4 developers working on a particular, well-defined feature. This means that each developer has a significant investment in the feature, sharing at least a quarter of the responsibility for the quality of the code. Because the team is often working in the same section of code for the duration of the feature, there is a greater incentive to keep the code clean.

Code Reviews

Code reviews, whether done continually through pairing or just before check-in, serve as an extra opportunity to consider the impact of your changes on the code base, including how it will affect future maintenance by the team as a whole.

Ownership of Bugs

Where reasonable, any bugs that come up during development ought to be addressed by the individual that caused them. Attitude is important here. This is not merely an exercise of pointing blame and public humiliation. Rather, that developer is often more familiar with the section of code. This also helps developers to have ownership of the quality of the code that they produce.

The key here is that successful systems are written by developers who take personal responsibility for their code, yet at the same time realize they are part of a team. With the understanding that everyone likes to work in a clean code base, the team, and each individual, must commit to practices that produce quality, maintainable code.

Don't Parse with RegEx

2012-10-29T22:46:00.001-07:00

At some point, I think all developers have tried to use regular expressions to parse input. It may work at first, but it quickly becomes unwieldy for all but the most trivial of inputs.

Engineers learned early on that it simplifies things drastically to separate the compiler into separate modules – first find the individual tokens, then use a grammar to see if the tokens are arranged correctly, then assign semantic meaning to the statements, and so forth.

Regular expressions are great for getting the individual tokens. This is the scanning, or lexing phase. With the tokens in hand, we're ready to do the actual parsing. To do this properly, we need to specify a formal grammar. This is typically done in BNF. Grammars can deal with nesting and other constructs that regular expressions don't handle well. For example, imagine trying to match up opening and closing sets of parentheses using regular expressions. This is trivial to specify in a grammar.

Separation of concerns is another advantage of doing it this way. If we truly follow the single responsibility principle, we have no justification for trying to lex and parse at the same time.

Regular expressions are powerful tools by themselves, but don't abuse them – especially if you're dealing with a full-fledged domain-specific language.

State vs. Interaction Testing

2012-09-16T20:53:00.000-07:00

In my post Unit Testing and Assumptions, I described my ideal type of test: "I don't care how [the algorithm does] it, just that the output is what I expect. These tests are short, easy to write, and they make no assumptions about the underlying code." What I didn't realize at the time is that I was simply making the case for state-based testing.

State-based testing focuses on the results of a computation, not on the specific steps of an algorithm. Instead of verifying that add(2, 2) was called on a mock, for example, we simply assert that the result is 4. This makes the tests less brittle and usually easier to write because of less setup. It's also better from a TDD perspective since you don't need to know the details of the code-to-be-implemented in order to write your failing test. For better or worse, these type of tests can quickly turn into "mini-integration" tests since they lack the test isolation that mocks provide.

State-based testing stands in contrast to interaction or behavioral testing. Here the focus is, well, on the interaction of the system under test with a mock object. This can be useful when the method delegates some work to a collaborator. The collaborator object is already tested, so we just need to verify that the delegation takes place with the correct parameters. Indeed, this is essentially the only way to unit test code that makes calls to a web service or database. The downside, of course, is that the test locks down the behavior of the code. If that behavior ever changes, even if the results stay the same, we must update the test accordingly.

I tend to be a proponent of state-based testing whenever possible. I use it in cases where my class under test has no collaborators, or when the collaborators are lightweight. I like that this type of testing encourages return values over side-effects. I'm also usually more concerned that the results themselves are correct than how the code computed them.

If I have a good reason to isolate the class completely, then I use mocks to verify behavior. This is usually on the seams of the system, such as near the data access layer.

Ultimately, I end up doing whatever feels right at the time given the situation.

Learn a New Language

2012-08-21T10:13:00.001-07:00

Students of linguistics have probably heard of the Sapir-Whorf Hypothesis. It essentially states that the languages you know directly influence your understanding of the world. In other words, the languages that you know influence how you think about and approach problems.

Though often applied to natural languages, this can certainly be applied to programming languages as well. If we have had exposure to multiple programming language paradigms, such as declarative, functional, and imperative, we will be better at solving problems by choosing the right tool for the job.

Certain language paradigms work better for some certain classes of problems. For example, query languages such as SQL work well as declarative languages because we're more interested in what to find rather than the exact steps of finding it. Declarative languages are also a great way to specify the view for a program. We see this with XML for Android and XAML for Windows WPF applications.

Then we have functional languages, which extol the virtues of pure functions that are free from state and side effects. This makes it easier to reason about and test our programs. It also allows us to take advantage of lazy evaluation and to parallelize our programs quite trivially.

Finally, imperative languages allow us to specify how to accomplish a given task. This is important when performance is a concern or when we need more granularity.

Fortunately for us, many of the newer languages are multi-paradigm, allowing us to use declarative, functional, and imperative ideas in a single language.

So learn a new language, preferably of a paradigm that you're not as familiar with. Even if you rarely utilize the language itself, the concepts gleaned from that language will make you a better programmer and problem-solver.

Back to the Terminal

2012-08-16T11:27:00.001-07:00

One of the big problems of GUI programming is trying to separate business logic from the presentation. Many frameworks and patterns, such as Model-view-controller, exist for just this purpose.

I recently came across a less common approach to this problem. From the StackOverflow Podcast #41:

Atwood: ...the classic UNIX way of developing a GUI was you start with a command-line app, which has a defined set of inputs and outputs, pipes, basically text going in and out of it.
Spolsky: A lot of command-line arguments.
Atwood: And then you put a GUI on top of that, so then you have perfect separation. You don't necessarily have a great app which is the deeper problem, but you have perfect separation, because you can test the command line independently...

Seems like an interesting solution, especially if your target audience is expert users who may prefer a command-line application anyway.

Unknowingly, this is the approach I took with my Batch Photo Editor. From the beginning, it was designed to be a command-line application. This allowed me to focus on the core logic of the application and not have to worry about presentation. Later, I will always have the option of adding a GUI, in which case I would get the business-presentation separation for free.

The downside of this approach is that, if you're not careful, the resulting GUI may be nothing more than a wrapper around your original command-line interface. If your main goal is an extremely user-friendly GUI, perhaps you're better off with UI-First Development. When possible, though, it's refreshing to be able to focus on the actual code that you're writing and not the syntax to wire up a button click handler.

Refactor at Your Own Risk

2012-08-12T15:35:00.001-07:00

Often when we're left to work in someone else's code, we think it's poorly written and seek to improve it or, perhaps worse, rewrite it entirely. Refactor at your own risk, however. Doing so often has unintended consequences or provides little value. Here are my general guidelines:

Don't refactor code that won't change

When reworking code, you get the biggest bang for your buck with the code that your programmers work in nearly every day. You know, those places in code that are constantly changing when bugs are discovered or requirements change.

Now contrast this with refactoring that class that hasn't changed in years. Generally we refactor a piece of code so that it will be easier to make changes in the future. If a class is fairly static*, then refactoring provides little to no benefit and only increases the risk of adding new bugs.

Refactor in increments

Refactoring works best when done in small increments. Joel Spolsky reminds us to avoid the big rewrite:

We're programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We're not excited by incremental renovation: tinkering, improving, planting flower beds.

There's a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:

It’s harder to read code than to write it.

Be wary of refactoring uncovered code

Refactoring is best done when there is a suite of unit tests that assert the correct behavior of the class. In the absence of tests, consider first writing some tests or perhaps avoid refactoring altogether.

Use refactoring tools

Where possible, use tools that will perform a variety of common refactors, such as renaming variables, extracting methods, and changing method signatures.

Don't rework code for no reason. Make those changes that will increase the readability and maintainability of your code. As always, the benefits must be weighed against the risks. Remember that, to your users, a working product is king.

*no pun intended

Object-Relational Mapping

2012-08-06T21:49:00.000-07:00

I've been thinking a lot lately about the object relational mismatch. It's interesting to note that relational databases and object-oriented programming basically evolved around the same time, yet they're not very compatible with each other.

There's been a lot of good posts on the subject, such as Jeff Atwood's Object-Relational Mapping is the Vietnam of Computer Science. He basically concludes that there are 4 solutions to the problem: Give up relational databases, give up objects, manually map between them, or use an object-relational mapper.

Recently I had concluded that object relational mapping is the way to go. Now I'm not so sure. As I was looking at some example code for a Python ORM called Peewee, I realized something: with ORMs, you are still essentailly defining all of your models as relations. You're just doing it in code rather than in SQL. For example consider this example model definition for a blog from the Peewee documentation:

class Blog(BaseModel):
    name = CharField() # <-- VARCHAR

class Entry(BaseModel):
    headline = CharField()
    content = TextField() # <-- TEXT
    pub_date = DateTimeField() # <-- DATETIME
    blog = ForeignKeyField(Blog) # <-- INTEGER referencing the Blog table

Notice that we have to specify the type of a database column with each of our attributes. Since Python has no explicitly typed variables from which to infer this information, I guess this is permissible. Further notice that our entry has a ForeignKeyField that specifies the relationship between Entry and Blog. What we're left with is little more than a razor-thin layer of abstraction put on top of a relational database.

These objects would look quite a bit different if we weren't so concerned about mapping them. In particular, Blog would likely have a list of Entry, rather than each Entry having a "foreign key" to Blog.

Maybe making our classes look more like relations is the price we pay. It's certainly more appealing than maintaining manual mapping code, and most businesses are wary of using an object store that the developers essentially have exclusive control over.

This is a hard problem. I guess that's why Ted Neward calls it the Vietnam of Computer Science.

Privates Done Right

2012-07-19T23:03:00.002-07:00

When I started learning Python, I quickly learned that there is no way to officially make a member variable private. There is only a convention that any variable starting with a single or double underscore should be treated as an implementation detail that is likely to change. If you try to access it, however, Python will not stop you.

I think that this is the right approach. Sometimes there are legitimate reasons for accessing a private variable. For example, when testing legacy code, or using a third-party API. When I have a legitimate reason to do so, I don't want to have to go through some laborious process to get at the data, such as reflection.

Of course, with new code, the real solution is to write more modular classes with a single responsibility. This way, there are more public entry points to test classes in isolation.

For example, consider this trivial class that performs a calculation and writes the result to the console:

class Logger(object):
def write(value):
print value

class DataDisplayer(object):
def __init__(self, logger):
self.__logger = logger

def display(self, value1, value2):
value = self.__do_calculation(value1, value2)
self. __logger.write(value)

def __do_calculation(value1, value2):
return value1 * value2

In order to test the __do_calculation method, we have a few options. We can call the pseudo-private method directly. Or we can test it through it's public interface, display. This isn't desirable, though, because it has the side effect of writing a value to the screen.

The ideal solution is to the pull the responsibility of __do_calculation into its own class so we can test that in isolation:

class DataDisplayer (object):
def __init__(self, logger, calculator):
self.__logger = logger
self.__calculator = calculator

def display(self, value1, value2):
value = calculator.do_calculation(value1, value2)
self.__logger.write(value)

class Calculator(object):
def do_calculation(value1, value2):
return value1 * value2

Now we have a public interface to test do_calculation in isolation.

In other words, when we write our tests and classes correctly, we shouldn't have to access private variables very often. For those rare cases when we need to, though, it's kind of nice when the language doesn't try to prevent us.

Unit Testing and Assumptions

2012-06-28T21:55:00.001-07:00

When we write a unit test, for better or worse, we're locking down a section of code. This is good when we want to make sure the logic doesn't change. This can be bad, however, when our tests make it difficult to make good changes. Maybe you need to fix a bug, or change the details of an algorithm.

The problem is, sometimes unit tests make too many assumptions about the code under test. We assume that an algorithm will be implemented in a certain way. Perhaps the method contains a lot of side effects.

In my mind, the best-case scenario involves feeding input to a function and getting something in return. I give you a sentence, and you capitalize every word for me. I don't care how you do it, just that the output is what I expect:

def test_upper(self):
input = "this is a sentence."
output = upper(input)
self.assertEqual("This Is A Sentence.", output)

These tests are short, easy to write, and they make no assumptions about the underlying code. It turns out that the less side effects a method has, the easier it is to test.

Cosider the following code:

class box:
def __init__(self, length, width):
self.__length = length
self.__width = width

def compute_area(self):
return self.__length * self.__width

The constructor just sets two private fields. This is difficult to test without either accessing the internals of the class or exposing the fields through getters. Even then, we would be testing implementation details that are subject to change. I would argue that we should test the constructor indirectly by testing compute_area as follows:

def test_compute_area(self):
my_box = box(4, 5)
assertEqual(20, my_box.compute_area())

What we're really interested in is not that two private fields get set in the constructor, but that the object can compute its area.

My First Open Source Project

2012-06-23T10:41:00.001-07:00

In my first post, I mentioned my first project that I worked on in Python, called BatchEdit. I recently decided to host it on GitHub in order to motivate myself to work on it some more as well as to generate interest in the project.

BatchEdit is a command-line batch photo editor. Basically, you give it an input and output folder and specifiy some adjustments to be done, such as resizing, sharpening, boosting contrast, adding a border, etc. Here is an example of what the command looks like to auto-rotate, increase contrast, convert to grayscale, resize to 720 pixels, sharpen, add a gray border of 5 pixels, and overlay a watermark:

python scripts\BatchEdit.zip --input C:\input --output C:\output
--autorotate --resize 720 --grayscale --contrast 1.2 --sharpen 1.3
--border 5,gray --watermark C:\logo_transparent.png

Here is what the resulting transformation looks like:

In my mind, the most common use case for this program is preparing photos to upload to the web. Often photographers want to auto-rotate, resize, sharpen, and apply some basic adjustments such as boosting the contrast or saturation before uploading. If you have many photos to edit, doing this manually in a program like photoshop is quite tedious.

Under the hood, the code takes advantage of the Python Imaging Library (PIL) for all image manipulation.

Currently there are a few shortcomings of the program. First, it requires that the end user have Python and PIL installed on his system. Second, it does not yet have a GUI on top of it. These limitations would probably keep all but the most expert users from using it.

Batch photo editors have certainly been done before, but I think mine is simpler than most. To see the source code, click here.

Lean on the Compiler

2012-06-21T21:33:00.002-07:00

I've been thinking a lot lately about dynamic typing – about how it supposedly sets you free from the bonds of the compiler and makes you 10x more productive. Some blog posts even go so far as to claim that dynamic typing will replace strong typing due to the increase of unit testing. The advocates rightly point out that just because a program compiles does not mean it is correct. Unit tests are therefore a better safety net. Now, granted the code has full coverage, I agree that we can probably do without compile-time checking.

But is а fully-covered code base a reality? Not likely. Especially when you're working with legacy code.

Today at work I was doing quite a bit of signature changes to my methods – adding and removing parameters to several methods in many different classes. In situations like this, I'm glad to have the compiler to make sure I don't miss anything, even before I have a chance to run what tests I do have. "Lean on the compiler" is the phrase Michael Feathers uses in Working Effectively with Legacy Code. Great read, by the way.

Today I realized that this is one technique that I really miss in dynamically typed languages. Rather than holding me back, in this case the compile-time checking provided confidence to make changes quickly.

When TDD Breaks Down

2012-06-09T11:33:00.000-07:00

There's been a lot of hype recently about Test-Driven Development (TDD). I really like it for the most part. It forces the programmer to think about how the code will be used. This will often result in a more easy-to-use API. It's also a good way to explore all of the edge cases and make sure they are covered with a unit test.

I've found, though, that sometimes having to write the test before the code is a little too restrictive. Sometimes you have no idea what the resulting code will look like or how exactly it will achieve your goal. In this case, I am an advocate of what I like to call exploratory coding. Code first until you have an idea of what your code needs to do. This may be accompanied by some ad-hoc testing. Then, when you have the functionality nailed down, cover it with tests. Of course, this only works if you're careful to keep your code modular and testable. If not, you may need to refactor first or rewrite the code altogether following TDD.

We often joke at work that the proper way to do TDD is to write the code first and comment it out, then get your failing test and fix it by uncommenting the code. This just goes to show how counter-intuitive test-first can be at times.

I often hear people argue that if you write the code first, there's a chance you'll get lazy and won't get around to testing it afterwards. I disagree. This just requires a little discipline, which I would argue is essential to being a good developer in the first place.

Essentially, the principle behind TDD is the importance of modular, well-tested code. Indeed, I would say that the rise of unit testing is one of the biggest developments to improve the overall quality of code in recent times.

The Virtues of Dynamic Typing

2012-06-03T20:52:00.001-07:00

Last semester, I took a class in Python. Coming from a C++/C# background, many things were new to me. I liked not having to use curly braces. I also liked not having to declare the type of my variables.

A few things caught me off guard, however. I found it strange that Python did not allow me to declare my class variables as private. I was also disappointed that I couldn't explicitly define an interface.

What I initially considered limitations of the language, I eventually regarded as liberating. I felt empowered to be a responsible programmer without being babysat by the compiler. I knew what types I was using, and I knew which variables I intended to be private. When I wanted to use an interface, I simply used the same method signature in several different classes. Indeed, my productivity was increased significantly as a result of this flexibility.

At the end of the class we had a chance to build a final project. In just three weeks, I was able to build a command-line batch photo editor. Working on it was a joy, and the resulting code was very clean and terse.

My only frustration that did not have a satisfactory conclusion was the problem of deployment. Either the end user needs to have Python installed on his system, or you must somehow package up the interpreter with the application. Neither solution is ideal.

Overall, I concluded that in many cases, static typing is an unnecessary burden to programmers. Dynamic typing is not without its problems, but Python will definitely hold a prominent place in my quiver from now on.