Tuesday, August 21, 2012

Learn a New Language

Students of linguistics have probably heard of the Sapir-Whorf Hypothesis. It essentially states that the languages you know directly influence your understanding of the world. In other words, the languages that you know influence how you think about and approach problems.

Though often applied to natural languages, this can certainly be applied to programming languages as well. If we have had exposure to multiple programming language paradigms, such as declarative, functional, and imperative, we will be better at solving problems by choosing the right tool for the job.

Certain language paradigms work better for some certain classes of problems. For example, query languages such as SQL work well as declarative languages because we're more interested in what to find rather than the exact steps of finding it. Declarative languages are also a great way to specify the view for a program. We see this with XML for Android and XAML for Windows WPF applications.

Then we have functional languages, which extol the virtues of pure functions that are free from state and side effects. This makes it easier to reason about and test our programs. It also allows us to take advantage of lazy evaluation and to parallelize our programs quite trivially.

Finally, imperative languages allow us to specify how to accomplish a given task. This is important when performance is a concern or when we need more granularity.

Fortunately for us, many of the newer languages are multi-paradigm, allowing us to use declarative, functional, and imperative ideas in a single language.

So learn a new language, preferably of a paradigm that you're not as familiar with. Even if you rarely utilize the language itself, the concepts gleaned from that language will make you a better programmer and problem-solver.

Thursday, August 16, 2012

Back to the Terminal

One of the big problems of GUI programming is trying to separate business logic from the presentation. Many frameworks and patterns, such as Model-view-controller, exist for just this purpose.

I recently came across a less common approach to this problem. From the StackOverflow Podcast #41:
Atwood: ...the classic UNIX way of developing a GUI was you start with a command-line app, which has a defined set of inputs and outputs, pipes, basically text going in and out of it.
Spolsky: A lot of command-line arguments.
Atwood: And then you put a GUI on top of that, so then you have perfect separation. You don't necessarily have a great app which is the deeper problem, but you have perfect separation, because you can test the command line independently...
Seems like an interesting solution, especially if your target audience is expert users who may prefer a command-line application anyway.

Unknowingly, this is the approach I took with my Batch Photo Editor. From the beginning, it was designed to be a command-line application. This allowed me to focus on the core logic of the application and not have to worry about presentation. Later, I will always have the option of adding a GUI, in which case I would get the business-presentation separation for free.

The downside of this approach is that, if you're not careful, the resulting GUI may be nothing more than a wrapper around your original command-line interface. If your main goal is an extremely user-friendly GUI, perhaps you're better off with UI-First Development. When possible, though, it's refreshing to be able to focus on the actual code that you're writing and not the syntax to wire up a button click handler.

Sunday, August 12, 2012

Refactor at Your Own Risk

Often when we're left to work in someone else's code, we think it's poorly written and seek to improve it or, perhaps worse, rewrite it entirely. Refactor at your own risk, however. Doing so often has unintended consequences or provides little value. Here are my general guidelines:

  • Don't refactor code that won't change

When reworking code, you get the biggest bang for your buck with the code that your programmers work in nearly every day. You know, those places in code that are constantly changing when bugs are discovered or requirements change.

Now contrast this with refactoring that class that hasn't changed in years. Generally we refactor a piece of code so that it will be easier to make changes in the future. If a class is fairly static*, then refactoring provides little to no benefit and only increases the risk of adding new bugs.

  • Refactor in increments

Refactoring works best when done in small increments. Joel Spolsky reminds us to avoid the big rewrite:
We're programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We're not excited by incremental renovation: tinkering, improving, planting flower beds.
There's a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It’s harder to read code than to write it.

  • Be wary of refactoring uncovered code

Refactoring is best done when there is a suite of unit tests that assert the correct behavior of the class. In the absence of tests, consider first writing some tests or perhaps avoid refactoring altogether.

  • Use refactoring tools

Where possible, use tools that will perform a variety of common refactors, such as renaming variables, extracting methods, and changing method signatures.

Don't rework code for no reason. Make those changes that will increase the readability and maintainability of your code. As always, the benefits must be weighed against the risks. Remember that, to your users, a working product is king.

*no pun intended

Monday, August 6, 2012

Object-Relational Mapping

I've been thinking a lot lately about the object relational mismatch. It's interesting to note that relational databases and object-oriented programming basically evolved around the same time, yet they're not very compatible with each other.

There's been a lot of good posts on the subject, such as Jeff Atwood's Object-Relational Mapping is the Vietnam of Computer Science. He basically concludes that there are 4 solutions to the problem: Give up relational databases, give up objects, manually map between them, or use an object-relational mapper.

Recently I had concluded that object relational mapping is the way to go. Now I'm not so sure. As I was looking at some example code for a Python ORM called Peewee, I realized something: with ORMs, you are still essentailly defining all of your models as relations. You're just doing it in code rather than in SQL. For example consider this example model definition for a blog from the Peewee documentation:
class Blog(BaseModel):
    name = CharField() # <-- VARCHAR

class Entry(BaseModel):
    headline = CharField()
    content = TextField() # <-- TEXT
    pub_date = DateTimeField() # <-- DATETIME
    blog = ForeignKeyField(Blog) # <-- INTEGER referencing the Blog table
Notice that we have to specify the type of a database column with each of our attributes. Since Python has no explicitly typed variables from which to infer this information, I guess this is permissible. Further notice that our entry has a ForeignKeyField that specifies the relationship between Entry and Blog. What we're left with is little more than a razor-thin layer of abstraction put on top of a relational database.

These objects would look quite a bit different if we weren't so concerned about mapping them. In particular, Blog would likely have a list of Entry, rather than each Entry having a "foreign key" to Blog.

Maybe making our classes look more like relations is the price we pay. It's certainly more appealing than maintaining manual mapping code, and most businesses are wary of using an object store that the developers essentially have exclusive control over.

This is a hard problem. I guess that's why Ted Neward calls it the Vietnam of Computer Science.