Google

Wednesday, December 28, 2005

Good Development Practices for Open-Source Developers

Don't rely on proprietary code, languages, or libraries. Open-source developers don't trust code for which they can't review the source.

Use GNU Autotools autoconf, autoheader, automake. Configuration choices should be made at compile time. People building from sources today expect to be able to type configure; make; make install and get a clean build. The software must be able to determine for itself any information that it may need at compile- or install-time.

Test your code before release.
A good test suite allows the team to easily run regression tests before releases. Create a strong, usable test framework so that you can incrementally add tests to your software without having to train developers in the specialized intricacies of the test suite.
It is good practice, and encourages confidence in your code, when it ships with the test suite you use, and that test suite can be run with make test.

Sanity-check your code before release.
Use every tool available that has a reasonable chance of catching errors a human would be prone to overlook. The more of these you catch with tools, the fewer your users and you will have to contend with.
If you're writing C/C++ using GCC, test-compile with -Wall and clean up all warning messages. Run tools that look for memory leaks and other runtime errors; Electric Fence and Valgrind are two good ones available in open source.
For Python projects, the PyChecker program can be a useful check. It often catches nontrivial errors.
If you're writing Perl, check your code with perl -c (and maybe -T, if applicable). Use perl -w and 'use strict' religiously.

Spell-check your documentation and READMEs before release.

I would say most of these practices are not only good for Open-Source Developers, all developers could take benefit from them.

This is a 3th post with guidelines out of Eric Raymond's excellent book "The Art of Unix Programming". Read also:
- Basics of the Unix Philosophy
- Design Rules for Textual Data Formats

Some more posts with guidelines out of the book will follow next year, keep in touch.

Digg this story

Monday, December 12, 2005

Design Rules for Textual Data Formats

Another set of rules from Eric Raymonds excellent "The Art of Unix Programming". Use textual data format instead of binary to store or transport your data:
  • Easy for human beings to read, write, and edit without specialized tools.
  • Easy to prepare test data and to debug.
  • Future-proof your system. One specific reason is that ranges on numeric fields aren't implied by the format itself.
  • Other tools and applications can easily use your data, stimulating reuse and innovation.
Unix Textual File Format Conventions
  • One record per newline-terminated line, if possible.
  • Less than 80 characters per line, if possible.
  • Use # as an introducer for comments.
  • Support the backslash convention.
  • In one-record-per-line formats, use colon or any run of whitespace as a field separator.
  • Do not allow the distinction between tab and whitespace to be significant.
  • For complex records, use a ‘stanza’ format: multiple lines per record, with a record separator line of %%\n or %\n.
  • In stanza formats, either have one record field per line or use a record format resembling RFC 822 electronic-mail headers, with colon-terminated field-name keywords leading fields.
  • In stanza formats, support line continuation.
  • Either include a version number or design the format as self-describing chunks independent of each other.
  • Beware of floating-point round-off problems.
  • Don't bother compressing or binary-encoding just part of the file.
Data File Metaformats
  • DSV Format: Delimiter-Separated Values. One record per line, colon separated fields. Most appopriate for tabular data keyed by a name in the first field
  • RFC822 Format: derives from the textual format of Internet electronic mail messages, record attributes are stored one per line, named by tokens resembling mail header-field names and terminated with a colon followed by whitespace. Field names do not contain whitespace; conventionally a dash is substituted instead. The attribute value is the entire remainder of the line, exclusive of trailing whitespace and newline. A physical line that begins with tab or whitespace is interpreted as a continuation of the current logical line. A blank line may be interpreted either as a record terminator or as an indication that unstructured text follows.
  • Cookie-Jar Format, appropriate for records that are just bags of unstructured text. It simply uses newline followed by %% as a record separator.
  • Record-Jar Format: Cookie-jar record separators combined with the RFC 822 metaformat for records, support multiple records with a variable repertoire of explicit fieldnames.
  • XML Format: well suited for complex data formats though overkill for simpler ones. It is especially appropriate for formats that have a complex nested or recursive structure
  • Windows INI Format: appropriate if your data naturally falls into its two-level organization of name-attribute pairs clustered under named records or sections.
Read more at "The Art of Unix Prgramming, chapter 5: The Importance of Being Textual", with examples.
See also "Basics of the Unix Philosophy" with general rules for good programming design.

Digg this story

Monday, December 05, 2005

Basics of the Unix Philosophy

From Eric Raymond's "The Art of Unix Programming" i picked here the 17 rules described as the Basics of the Unix Philosophy. For me these are also rules for writing high quality software:

Rule of Modularity: Write simple parts connected by clean interfaces.
Rule of Clarity: Clarity is better than cleverness.
Rule of Composition: Design programs to be connected with other programs.
Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
Rule of Simplicity: Design for simplicity; add complexity only where you must.
Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
Rule of Transparency: Design for visibility to make inspection and debugging easier.
Rule of Robustness: Robustness is the child of transparency and simplicity.
Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
Rule of Least Surprise: In interface design, always do the least surprising thing.
Rule of Silence: When a program has nothing surprising to say, it should say nothing.
Rule of Repair: Repair what you can — but when you must fail, fail noisily and as soon as possible.
Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
Rule of Diversity: Distrust all claims for one true way.
Rule of Extensibility: Design for the future, because it will be here sooner than you think.

Or even rules for living a high quality life !

The Art of Unix Programming is indispensable reading for ALL developers, not just Unix ones.

Eric Raymond's writing is very clear, concise, transparent, and easy to read, just like the Unix coding and design styles he advocates.

Digg this story