08 Jun 2008
Significant White Space

The significance and formatting of whitespace in source code is a religious issue. Essentially, that means that opinions and intense personal preferences far outweigh any rational thoughts on the matter. That said, here’s my contribution to all the noise and furor.

Some of you may remember column-sensitive programming languages like RPG and FORTRAN, where particular syntactic elements have to appear in certain columns. Such fun.

Also, line endings are significant in many languages, being used to terminate a statement. Of course, you have to be careful whenever you transfer source code among systems, since there are three common end-of-line markers: bare line-feed (LF), bare carriage-return (CR), and the two-character combo CR-LF. I won’t even get into the various record-oriented (vs character-stream) text storage formats.

Unix traditionally calls line-feed newline (NL), but all these variations are often informally called newline, just because it’s easier to say than end-of-line. Note, however, that there is a real ASCII-8 character named new (or next) line (NEL, decimal 133).

As a pathological case, at least one flavor of BASIC used LF-CR (in contrast to the “normal” CR-LF) to continue a logical (numbered) line onto the next physical text line.

Another pathology was that FORTRAN ignored embedded white space (other than column alignment). Whee!

At the other end of the line, there’s indentation.

A degenerate case of column sensitivity requires statements to begin in column one, with indentation (of any size or nature) indicating logical line continuation.

Makefiles use a tab in column one to indicate a command (other horizontal white space is not significant). Now, I never found this to be an onerous burden, but it annoyed Davidson so much that he created Ant:

Makefiles are inherently evil as well. Anybody who has worked on them for any time has run into the dreaded tab problem. “Is my command not executing because I have a space in front of my tab!!!” said the original author of Ant way too many times. Tools like Jam took care of this to a great degree, but still have yet another format to use and remember.

Apache Ant – Welcome

That quote (and Ant) has a slew of issues I’d like to rant about, but that’s for another day.

Then there’s Python:

… indentation is Python’s way of grouping statements. Python does not (yet!) provide an intelligent input line editing facility, so you have to type a tab or space(s) for each indented line. In practice you will prepare more complicated input for Python with a text editor; most text editors have an auto-indent facility. When a compound statement is entered interactively, it must be followed by a blank line to indicate completion (since the parser cannot guess when you have typed the last line). Note that each line within a basic block must be indented by the same amount.

3. An Informal Introduction to Python

Probably the hottest battlefront in the indentation holy war is how to indent. Should tabs (HT) be banned, only allowing spaces (SP)? The indent increment should be two spaces! No, three! No, four! No, <arbitrary number here>!!! Arrrgh!!!! If you’re on a team, just pick something and stick with it, ok? (Insert mode lines for Emacs and Vim to make it easy.) I would suggest four or greater, for people with old eyes like mine and/or who work on hi-res screens (I usually edit in a 60+ line window with tiny fonts.)

Finally there’s the intra- and inter-line layout. One statement per line? Column-align similar elements? White space around operators? Opening parenthesis snuggled or spaced after a function name? Then there’s that whole braces (“{}”) placement thing.

I think that the white space in programming languages issues are much like the page-layout vs semantic-markup issues in HTML — a bad mistake to mix them.

Another aspect is robustness — source code should be able to survive significant *format* mangling as it is passed through various agents: email; cut-and-paste into web pages; funky editor settings (and idiosyncratic human editors); invisible-to-the-eye white space variations (HT vs SP, trailing spaces, etc.); publication in books and magazines; bad printer drivers;…. After all, nowadays it’s not that difficult to find an editor or utility that will reformat your code any way you please.

Two of my favorite mantras are:

  • Just because you can doesn’t mean you should.
  • Guidelines, not rules.

I think you should be able to format source code as freely as possible. However, that doesn’t mean you should, unless you’re entering an obfuscated code contest. Or playing code golf.

I think that rigid rules are like training wheels — they’re fine for someone just learning, but you need to take them off as soon as possible. I firmly believe that occasional “exceptions” lead to more aesthetic, easier-to-comprehend code. I’m not suggesting anarchy; I suppose that what I really have is a set of meta-rules that override my basic-layout rules.

Perhaps I should add a third mantra:

  • Don’t let edge cases dominate.

Somewhat related to WS issues: I’m amused by people who are so into counting keystrokes. Just what is the deal, anyway? To me, a few keystrokes here and there are down in the noise range of the scale. If they matter so much, I think you’re doing something wrong — particularly after I see the time and effort some people expend to avoid them. Contrast with using XML for configuration files that are typically hand-edited. That’s psychotic. But those are also rants for another day.

I’d originally intended to include some code samples, but that would make this a much longer (and tedious to produce) post. I’ll wait and see if this attracts any attention before I go to that much trouble.

Category: programming
Tags: , ,
(comments closed) | (trackbacks closed) | Permalink | Subscribe to comments |

Site last updated 2015-01-12 @ 13:31:07; This content last updated 2008-06-08 @ 19:00:17