Code Generation

There are, in many applications, plenty of features that are trivial to implement but must be done over and over. Perhaps it's taking an array of model objects and preparing a list view, creating classes from database schemata, or creating a list of compile-time constants from a text file.

These situations can usually be automated by generating code. The idea is to express the problem in a succinct representation, then translate that into something that can be incorporated into your program. This is pretty much what a compiler does; though many programming languages are far from succinct, they're still much less unwieldy than the machine's native instruction code.

Writing Your Own Generator Shouldn't Be A First Resort

Just as a code generator makes it easier to create a product, it makes it harder to debug. For a concrete example, consider the autotools build system discussed earlier in this chapter. Imagine that a developer is looking into a reported problem in which one of the tests fails (a problem that I had to deal with today). The log file tells them what the C program was that encapsulated the test, but the developer cannot just modify that program. They must discover where the configure script is generating that program, and what it's trying to achieve by doing so. They must then find out where in configure.ac that section of the shell script is generated and work out a change to the m4 macros that will result in the desired change to the C program, two steps later.

In short, if your target environment offers facilities to solve your problem natively, such a solution will require less reverse engineering when diagnosing later problems. It's only if such a solution is overly expensive or error-prone that code generation is a reasonable alternative.

Many of the cases given at the beginning of this section were data-driven, like the situation deriving class descriptions from a database schema for some Object-Relational Mapping (ORM) system. This is a case where some programming languages give you the ability to solve this problem without generating code in their language. If you can resolve messages sent to an object at runtime, then you can tell that object which table its object is in and it can decide whether any message corresponds to a column in that table. If you can add classes and methods at runtime, then you can generate all of the ORM classes when the app connects to the database.

The existence and applicability of such features depends very much on the environment you're targeting but look for and consider them before diving into writing a generator.

When the Generator Won't Be Used by A Programmer

If the target "customer" for this facility isn't going to be another developer, then a generator can often be a better choice than a full-featured programming language, despite the increase in implementation complexity.

A solution that's often explored in this context is a Domain-Specific Language (DSL), a very limited programming language that exposes grammar and features much closer to the problem that the customer understands than to computer science concepts. Many projects that I've been involved with have used DSLs, because they offer a nice trade-off between letting the customer modify the system as they see fit and avoiding complex configuration mechanisms.

Case study

The "customer" using the application doesn't need to be the end user of the finished product. On one project I worked on, I created a DSL to give to the client so that they could define achievements used in the project's gamification feature. A parser app told them about any inconsistencies in their definitions, such as missing or duplicate properties, and also generated a collection of objects that would implement the rules for those achievements in the app. It could also generate a script that connected to the app store to tell it what the achievements were.