Wednesday, February 28, 2007

Password strength meter

You hate the MSN butterfly logo, right? Me too. So like the rest of the JFKBits readers and this author, you avoid anything that ever says MSN. That's fine, I think we're better for not knowing about "Shoe Trends for Spring" and "5 ways to see if you have found your other half", at least MSN-style.

But until just now, when I went to sign up for a Windows Live webcast, where they use the same account creation mechanism as for MSN, I didn't know about a cool feature in the account setup page.

As you type your chosen account password, they have Javascript code that calculates the password strength and shows a 3-level, color-coded bar graph (weak, medium, strong).



Type. Done, strong password.

Of course, we don't know how good their algorithm is for determining strength, but if it's deficient somehow that can be improved. For the instant feedback and simple visuals, I think this idea is a winner.

Monday, February 26, 2007

Parsing Windows SDDL

You heard me. Yeah, I'm talkin' to YOU!

Well, that's how the computer made me feel when I got that taste of Windows SDDL.

After blustering about doing Princeton CS assignments in Ruby last Thursday, and then considering some of the suggestions from reader comments, I found another thing to try. It's like in Alice's Restaurant when he thinks about what could happen to him, "but when we got to the police officer's station there was a third possibility that we hadn't even counted upon, and we was both immediately arrested."

Well friends, that third possibility of what to write in Ruby is parsing Windows Security Descriptor Definition Language. It turns out that despite all the whining about Windows security, Microsoft hasn't been idle, they have at least written a large number of words on the topic, and their tools generate large numbers of letters, numbers and symbols when you ask them what somebody can do with a given operating system object.

This idea of an SDDL parser was a good candidate for a first Ruby program, because it was a real application. I know how to read basic Unix permission masks, but not this line noise.

When I found this SDDL reference, I was very grateful for the reference, but thought "a program ought to be able to decipher this stuff and print a more readable version".

Well, that became my first useful (ahem) Ruby program. The program in question queries for the security descriptor of a Windows service and then prints a more readable form of the SDDL string.

First impressions of working with Ruby

The first challenge was finding documentation that I could work with. With Python, I needed a "hello world", something a little more advanced to orient me to variable and control structure syntax, and a library reference. I got started a bit with the stuff at Ruby's documentation site like Ruby in Twenty Minutes and Ruby From Other Languages. I ended up mostly using the docs that came with the Windows installer. The Ruby User's Guide fulfilled the first two requirements, and the RubyBook help (Windows help file) was a decent library reference.

This exercise helped me use classes, hashes, iterators, and of course string and array operations.

  • Pro: Control structure syntax is nice, the lack of parens in if, while, and for conditions is a refreshing economy, the regularity of end is easy to comprehend (thank you, Ada, though it doesn't make you tell it what's ending)

  • Surprise: No autoincrement (r++ must be written r+=1)

  • Pro: not having to give an empty tuple for methods that take no arguments (obj.peform() can be written obj.perform)

  • Pro: error messages were quite reasonable, including not only line numbers of the problem area but also those of parts of the call chain.

  • Con: the presence of map and similar iterators is encouraging, but if there's a way to do map that's not a method call, I don't know about it. Talk about a Kingdom of Nouns, that's just what pure object oriented will get you.

  • Con: also in the way of iterators, the parser had a need in a couple of places to iterate over a list, but in a way that map and each can't handle because handling one item required handling the next item in the list at the same time. There didn't seem to be a good way around this, e.g. pattern matching.

Let's examine the the last point. The first example of what I mean takes place at the top of the structure. At the highest level you have productions like this:

SectionList => Section | Section SectionList
Section => "D:" ACL | "S:" SACL | "O:" SID | "G:" SID

We handle this by splitting the string with a ":" separator. Then it's a bit like command-line argument processing. Based on the next token you see, you may need to consume another token following. In this case, it would be sufficient to generate an iterator that gets every other index. It seems like there should be a fairly simple way to do that, since there are iterators over indices, but I couldn't find it yet. A pattern matching facility, as in SML or Mathematica, would be the most direct way to express this though.

Things like the delete_if and compact methods on arrays are nice to have built in, but the list of similar iterating methods didn't feel very complete. For example, I was surprised there didn't seem to be any kind of filtering methods, e.g. return a new array of elements that satisfy this predicate. The compact method is just a special case of such a filtering method. I suppose you could compose collect and compact to get a similar effect, where compact would nil out elements that don't match, but that's a little weak.

I also wonder how to do something like fold. Do Ruby programmers just rely on doing that procedurally with each and accumulating into a local variable?

Well, that's the first foray into Ruby. For the dedicated, I've included the actual program output and source code.

Appendix: The output

Back to post | See source

$ /c/Apps/ruby/bin/ruby ssdlparse.rb eventlog
Security descriptor for service eventlog:

Discretionary Access Control List (DACL):
Access Control Entry: ACCESS ALLOWED
Trustee: Authenticated users
Directory Service Access Rights:
Create All Child Objects
List Contents
All Validated Writes
List Object
All Extended Rights
Read Permissions
Access Control Entry: ACCESS ALLOWED
Trustee: Power users
Directory Service Access Rights:
Create All Child Objects
List Contents
All Validated Writes
Read All Properties
List Object
All Extended Rights
Read Permissions
Access Control Entry: ACCESS ALLOWED
Trustee: Built-in administrators
Directory Service Access Rights:
Create All Child Objects
Delete All Child Objects
List Contents
All Validated Writes
Read All Properties
Write All Properties
Delete Subtree
List Object
All Extended Rights
Read Permissions
Modify Permissions
Modify Owner
Access Control Entry: ACCESS ALLOWED
Trustee: Local system
Directory Service Access Rights:
Create All Child Objects
List Contents
All Validated Writes
Read All Properties
Write All Properties
Delete Subtree
List Object
All Extended Rights
Read Permissions

The source code for sddlparse.rb

Back to post | See output

def main
service = if ARGV.length == 0 then 'eventlog' else ARGV[0] end
rawscoutput=`sc sdshow #{service}`
scoutput = rawscoutput.strip()
puts "Security descriptor for service #{service}:"
puts scoutput

parts = scoutput.split(':')
while i < parts.length
elt = parts[i]
case elt
when "D"
# Get the ACL, a pattern of one or more ACEs like '(A;;CCLCSWLOCRRC;;;AU)'
i+=1 # no i++?
elt = parts[i]
dacl ="Discretionary Access Control List (DACL)",elt)
# TODO add cases for SACL, primary group, and owner
when true
print "Unhandled case '"+elt+"'\n"

class Acl
def initialize(name,content)

def parse
puts @name+":"
# contents are one or more '(A;;CCLCSWLOCRRC;;;AU)'
# First element is empty string, others have trailing ')'
elts.delete_if { |x| x.length==0 }! { |x| x.delete ')' }

@aces = { |x| } { |x| x.parse }

class Ace
def initialize(content)

# @content is like 'A;;CCLCSWLOCRRC;;;AU'
def parse
elts = @content.split ';'
@type = elts[0]
@flags = parseOpcodes(elts[1])
@permissions = parseOpcodes(elts[2])
@objtype = elts[3]
@inheritedobjtype = elts[4]
@trustee = elts[5]

# Emit stuff

indent=" "
puts "Access Control Entry: "+AceType[@type]
if @objtype.length > 0
puts indent+"Object type: "+@objtype
if @inheritedobjtype.length > 0
puts indent+"Inherited object type: "+@inheritedobjtype
puts indent+"Trustee: "+Trustees[@trustee]
parseOpcodesSection(AceFlags, "Flags: ", indent)

"Generic Access Rights:",

"Directory Service Access Rights:",

"File Access Rights:",

"Registry Key Access Rights:",


def parseOpcodesSection(nameMap, title, indent)
rights = @permissions.collect { |x| nameMap[x] }
if rights.length > 0
puts indent+title
rights.each { |x| puts indent*2+x }

def parseOpcodes(opcodes)
# Isn't there a map-like way to do this?!
retval = []
while i < opcodes.length
i += 2


# The information for the following hashes is taken from
AceType =

AceFlags = {
"CI" => "CONTAINER INHERIT: Child objects that are containers, such as directories, inherit the ACE as an explicit ACE.",
"OI" => "OBJECT INHERIT: Child objects that are not containers inherit the ACE as an explicit ACE.",

GenericAccessRights = {

DirectoryServiceAccessRights = {
"RC" => "Read Permissions",
"SD" => "Delete",
"WD" => "Modify Permissions",
"WO" => "Modify Owner",
"RP" => "Read All Properties",
"WP" => "Write All Properties",
"CC" => "Create All Child Objects",
"DC" => "Delete All Child Objects",
"LC" => "List Contents",
"SW" => "All Validated Writes",
"LO" => "List Object",
"DT" => "Delete Subtree",
"CR" => "All Extended Rights"

FileAccessRights = {

RegistryKeyAccessRights = {
"KR" => "KEY READ",
"KW" => "KEY WRITE",

Trustees = {
"AO" => "Account operators",
"RU" => "Alias to allow previous Windows 2000",
"AN" => "Anonymous logon",
"AU" => "Authenticated users",
"BA" => "Built-in administrators",
"BG" => "Built-in guests",
"BO" => "Backup operators",
"BU" => "Built-in users",
"CA" => "Certificate server administrators",
"CG" => "Creator group",
"CO" => "Creator owner",
"DA" => "Domain administrators",
"DC" => "Domain computers",
"DD" => "Domain controllers",
"DG" => "Domain guests",
"DU" => "Domain users",
"EA" => "Enterprise administrators",
"ED" => "Enterprise domain controllers",
"WD" => "Everyone",
"PA" => "Group Policy administrators",
"IU" => "Interactively logged-on user",
"LA" => "Local administrator",
"LG" => "Local guest",
"LS" => "Local service account",
"SY" => "Local system",
"NU" => "Network logon user",
"NO" => "Network configuration operators",
"NS" => "Network service account",
"PO" => "Printer operators",
"PS" => "Personal self",
"PU" => "Power users",
"RS" => "RAS servers group",
"RD" => "Terminal server users",
"RE" => "Replicator",
"RC" => "Restricted code",
"SA" => "Schema administrators",
"SO" => "Server operators",
"SU" => "Service logon user"


Friday, February 23, 2007

TIOBE Programming Community Index

This TIOBE language ranking page, which I mentioned yesterday, is just too fascinating not to make some comment here. For some background, I found this recent post from an interesting read, one web developer's take on the trends and their implications for software development at large.

As fun as the ranking thing is, of course you do have to question how it was determined and what it really means. The sites says "the ratings are based on the world-wide availability of skilled engineers, courses and third party vendors."

There are some fun things to do with this list. One is to ask "why is Logo ranked higher than <insert name of favorite language not previously known to be so unpopular>?" I mean...where are the Logo engineers? Courses? Other than 5th grade computer time during math class, I mean? Third party Logo vendors? I know that Harlequin stopped making that commercial version of SML, is that what killed it on this list?

You can also do a little resume analysis. What languages would you put on your resume, if you thought that was a good idea, and how do they rank? If you ranked the languages you know by your expertise, how would it align with the current popularity? For me, I've used languages #1, 2, and 3 in my three professional jobs. The languages I've had the most fun using are mostly 'way down on the charts, unless you count Ada which I rather liked learning in data structures classes and which is still in the top 20.

Another fun thing is to play the functional language features game. It would go something like this: name all the features you can think of that were in functional languages fifteen years ago that are now in languages in this list's top 10. This is easier when a functional language is itself in the top 10.

And finally, for your reference, here's the list, slurped and reformulated in under ten minutes courtesy a couple of programming tools that didn't really rank high on the list at all:

  1. Java

  2. C

  3. C++

  4. PHP

  5. (Visual) Basic

  6. Perl

  7. Python

  8. C#

  9. JavaScript

  10. Ruby

  11. SAS

  12. Delphi

  13. PL/SQL

  14. ABAP

  15. D

  16. Lisp/Scheme

  17. Ada

  18. COBOL

  19. Pascal

  20. Lisp/Scheme

  21. Fortran

  22. FoxPro/xBase

  23. Awk

  24. Prolog

  25. IDL

  26. MATLAB

  27. Logo

  28. ActionScript

  29. Bash

  30. ColdFusion

  31. RPG

  32. LabView

  33. CL

  34. Smalltalk

  35. Forth

  36. REXX

  37. Maple

  38. Tcl/Tk

  39. S-lang

  40. Icon

  41. Haskell

  42. Natural

  43. VBScript

  44. Lua

  45. Q

  46. OCaml

  47. Objective-C

  48. APL

  49. Lingo

  50. ML

Thursday, February 22, 2007

Creative Programming

In some scattered spare moments I've been reading about Ruby, and starting to think about programming in it.

This is a recent goal, mentioned in All I want for Christmas. I've had the motive but not much opportunity for diving in to see what all the fuss is about. Fuss? Well, yes. Ruby is riding at the #10 spot in the TIOBE index of programming language popularity.

As I say, I haven't had the opportunity. This has to do with my house, where I'm essentially rebuilding the living room. Since I'm doing the work myself and with the help of friends, it's not the money pit exactly, but it is the time pit. I guess a well-known equation would say they're the same thing.

Anyway, I'd like to start writing programs in Ruby, but I'm stuck: what do I write? When one starts learning language after language, it'd be useful to have a stock of small but interesting or instructive applications to implement.

One interesting resource I found comes from the I'm Feeling Lucky link result from Google for "creative programming". You know, like "creative writing", only the writing is programming.

That link is Creative Programming Assignments from Princeton's CS department. Robert Sedgewick and others have put together a decent list of assignments for an "Introduction to Programming" course. They are relatively simple programs appropriate to the level, but still interesting, and best of all they cover a range of programming arenas, from algorithms to computer architecture. Here's a sampling of the titles:

  • Digital Signal Processing - Generate sound waves, apply an echo filter to an MP3 file, and plot the waves.

  • N-Body Simulation - Simulate the motion of N bodies, mutually affected by gravitational forces, in a two dimensional space.

  • Particle Collision Simulation - Simulate the motion of N colliding particles according to the laws of elastic collision. (Concepts: priority queue, event-driven simulation)

The N-Body simulation sounds a lot like the gravitational simulation Russ Olsen mentions in his post For the Joy of It, with a similar intent to mine, writing a familiar algorithm in a new language for fun.

So, as I get a few more spare moments hopefully I'll be able to try this experiment of doing some of the creative course assignments in Ruby. I guess I'll have to be careful about posting full source code, in case they want to re-target the course from Java (currently the #1 language in that TIOBE index) to Ruby in a few years.

Wednesday, February 21, 2007

XML-Driven Language Tools

In Java in XML, JFKBits reader Mark kindly pointed us to several tools of the sort I imagined in XML-Based Languages. These tools parse the "front end" syntax and produce an AST or other form of the source input in a form that is more accessible to other programs. The material on these tools put the issues and applications more succinctly than I did, so let's sample some quotes.

Why would you want to

From an FTPOnline article about JSIS, the Java Semantic Interface Specification:

Sample applications include browsing and navigation tools, code formatting and restructuring tools, source code generation tools, code coverage and test generation tools, code analysis and metric reporting tools, style and standard-compliance reporting, UML diagram and round-trip engineering tools, and interactive source code editing, to name just a few.

What's the issue

From the introduction to JavaML:

The classical plain-text representation of source code is convenient for programmers but requires parsing to uncover the deep structure of the program. While sophisticated software tools parse source code to gain access to the program's structure, many lightweight programming aids such as grep rely instead on only the lexical structure of source code.

The reference to grep is telling: early in my career I worked for two weeks on an Emacs-Lisp program that would convert the structured pseudocode we were writing to C source code. It was a successful program, since it gave everyone a head start on their C source files, but it definitely relied on grep-style (regular expression) parsing and thus had certain inherent limitations.

From the introduction to GCC-XML:

Development tools that work with programming languages benefit from their ability to understand the code with which they work at a level comparable to a compiler. C++ has become a popular and powerful language, but parsing it is a very challenging problem. This has discouraged the development of tools meant to work directly with the language.

There is one open-source C++ parser, the C++ front-end to GCC, which is currently able to deal with the language in its entirety. The purpose of the GCC-XML extension is to generate an XML description of a C++ program from GCC's internal representation. Since XML is easy to parse, other development tools will be able to work with C++ programs without the burden of a complicated C++ parser.

This is particular nice, considering the difficulty in constructing a C++ parser.

As I said in responding to Mark's comment, it would be nice if these parser modules, if you will, would become standard practice in new language development. Tool support helps acceptance of a new language, and it seems like providing programmatic access to the AST would lower the barrier to entry for tools development.

Tuesday, February 20, 2007

Java in XML

From some personal discussion with JFKBits readers about XML-Based Languages, I find there are some precedents to the idea of having a standard structured format, such as XML, for language Abstract Syntax Trees (ASTs).

One example is the PMD project:

PMD is a Java source code analyzer. It finds unused variables, empty catch blocks, unnecessary object creation, and so forth.

This is a language tool of the sort that I imagined would benefit from working with an XML-based AST.

One of the cool things PMD does is generate the Java AST as XML. PMD itself uses a Java parser written in JavaCC, but the Eclipse plugin has a "Generate Abstract Syntax Tree" menu option which makes a complete, detailed .ast file.

The discovery of this feature calls for an example.

Hello, World Java source

public class HelloWorld
public static void main(String[] args)
System.out.println("Hello, world.");

And just so we know what we're getting ourselves into,
Hello, World XML AST (see all of it)

<CompilationUnit beginColumn="1" beginLine="1" endColumn="3" endLine="7">
<TypeDeclaration beginColumn="1" beginLine="1" endColumn="1" endLine="7">
<ClassOrInterfaceDeclaration abstract="false" beginColumn="8" beginLine="1" endColumn="1" endLine="7" final="false" image="HelloWorld" interface="false" native="false" nested="false" packagePrivate="false" private="false" protected="false" public="true" static="false" strictfp="false" synchronized="false" transient="false" volatile="false">
<ClassOrInterfaceBody beginColumn="1" beginLine="2" endColumn="1" endLine="7">
<ClassOrInterfaceBodyDeclaration anonymousInnerClass="false" beginColumn="9" beginLine="3" endColumn="9" endLine="6" enumChild="false">
<MethodDeclaration abstract="false" beginColumn="23" beginLine="3" block="Block" endColumn="9" endLine="6" final="false" interfaceMember="false" methodName="main" native="false" packagePrivate="false" private="false" protected="false" public="true" resultType="ResultType" static="true" strictfp="false" synchronized="false" syntacticallyAbstract="false" syntacticallyPublic="true" transient="false" void="true" volatile="false">

The bloat factor is impressive. The 7 line, 120 byte Java source turned into 52 lines (bloat factor 7.4) and 5660 bytes (bloat factor 47). I generated the AST for another program, a very modest 70-line utility to print out details of all the network interfaces, and that was 1038 lines (B.F. 14). Should I repeat that this XML AST is intended for consumption by tools, not humans?

Appendix: Full Hello, World AST

<CompilationUnit beginColumn="1" beginLine="1" endColumn="3" endLine="7">
<TypeDeclaration beginColumn="1" beginLine="1" endColumn="1" endLine="7">
<ClassOrInterfaceDeclaration abstract="false" beginColumn="8" beginLine="1" endColumn="1" endLine="7" final="false" image="HelloWorld" interface="false" native="false" nested="false" packagePrivate="false" private="false" protected="false" public="true" static="false" strictfp="false" synchronized="false" transient="false" volatile="false">
<ClassOrInterfaceBody beginColumn="1" beginLine="2" endColumn="1" endLine="7">
<ClassOrInterfaceBodyDeclaration anonymousInnerClass="false" beginColumn="9" beginLine="3" endColumn="9" endLine="6" enumChild="false">
<MethodDeclaration abstract="false" beginColumn="23" beginLine="3" block="Block" endColumn="9" endLine="6" final="false" interfaceMember="false" methodName="main" native="false" packagePrivate="false" private="false" protected="false" public="true" resultType="ResultType" static="true" strictfp="false" synchronized="false" syntacticallyAbstract="false" syntacticallyPublic="true" transient="false" void="true" volatile="false">
<ResultType beginColumn="23" beginLine="3" endColumn="26" endLine="3" void="true"/>
<MethodDeclarator beginColumn="28" beginLine="3" endColumn="46" endLine="3" image="main" parameterCount="1">
<FormalParameters beginColumn="32" beginLine="3" endColumn="46" endLine="3" parameterCount="1">
<FormalParameter abstract="false" array="true" arrayDepth="1" beginColumn="33" beginLine="3" endColumn="45" endLine="3" final="false" native="false" packagePrivate="true" private="false" protected="false" public="false" static="false" strictfp="false" synchronized="false" transient="false" typeNode="Type" varargs="false" volatile="false">
<Type array="true" arrayDepth="1" beginColumn="33" beginLine="3" endColumn="40" endLine="3" typeImage="String">
<ReferenceType array="true" arrayDepth="1" beginColumn="33" beginLine="3" endColumn="40" endLine="3">
<ClassOrInterfaceType beginColumn="33" beginLine="3" endColumn="38" endLine="3" image="String"/>
<VariableDeclaratorId array="false" arrayDepth="0" beginColumn="42" beginLine="3" endColumn="45" endLine="3" exceptionBlockParameter="false" image="args" typeNameNode="ReferenceType" typeNode="Type"/>
<Block beginColumn="9" beginLine="4" endColumn="9" endLine="6">
<BlockStatement allocation="false" beginColumn="17" beginLine="5" endColumn="52" endLine="5">
<Statement beginColumn="17" beginLine="5" endColumn="52" endLine="5">
<StatementExpression beginColumn="17" beginLine="5" endColumn="51" endLine="5">
<PrimaryExpression beginColumn="17" beginLine="5" endColumn="51" endLine="5">
<PrimaryPrefix beginColumn="17" beginLine="5" endColumn="34" endLine="5">
<Name beginColumn="17" beginLine="5" endColumn="34" endLine="5" image="System.out.println"/>
<PrimarySuffix argumentCount="1" arguments="true" arrayDereference="false" beginColumn="35" beginLine="5" endColumn="51" endLine="5">
<Arguments argumentCount="1" beginColumn="35" beginLine="5" endColumn="51" endLine="5">
<ArgumentList beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<Expression beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<PrimaryExpression beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<PrimaryPrefix beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<Literal beginColumn="36" beginLine="5" endColumn="50" endLine="5" image="&quot;Hello, world.&quot;" stringLiteral="true"/>

Saturday, February 17, 2007

Oops! Watch which Firefox add-on you install

I just got fooled by the Firefox add-on ambiguity: there are now two add-ons, and they do different things. On one computer I have installed what's now known as the "classic" add-on for, the social bookmarking site. The classic add-on is a must-have if you want to tag a lot, because it puts a big tag button on the navigation toolbar, along with some other conveniences:

I wanted this on another computer. What I actually installed was a new add-on called " Bookmarks" by Yahoo. It was the #2 result in searching for "bookmarks" add-ons, so I thought it was "the" add-on I wanted. It's feature-rich, and I'm sure it's really useful. It also does something surprising, if you're expecting the classic add-on.

I should say it does three surprising things. The " Bookmarks" add-on:

  1. turns all your existing locally-saved Firefox bookmarks into public, server-hosted bookmarks (they can be made private if you select that option)

  2. completely replaces the Bookmarks menu with a one

  3. and, replaces the bookmarks toolbar with a bookmark toolbar

The glitch was that step #1 wasn't complete yet, so my Firefox bookmarks were nowhere to be found (except in a backup file which I wasn't yet aware of).

I make somewhat extensive use of bookmarks, and keep a lot of technical reference material organized there. I use the bookmark toolbar extensively too, so I can for example check the weather with a click. That has become more of a necessity lately, and is what I was trying to do when I noticed what the new add-on had wrought on my user interface and on my data.

Here's where we come to a bit of a shortcoming with Firefox add-ons, because there wasn't much in the obvious way of help available. The blog post "Firefox Bookmarks – A Love Hate relationship came to my rescue and described what the add-on had done and what to do about it.

While the new add-on might be nice, I wasn't ready. I uninstalled it and got the classic version. Fortunately the uninstaller kindly offered to restore my old bookmarks so I didn't have to mess with it. (Firefox keeps backups of bookmarks: in Windows, they're under %USERPROFILE%\Application Data\Mozilla\Firefox\Profiles\\bookmarkbackups.)

How could this have been avoided? Three ways.

One, I certainly could have been in less a hurry to install the add-on and read the text. Maybe that way I would have noticed things were not as I expected.

Two, Yahoo! and could have coordinated better and not released two confusing add-ons. Maybe the new add-on should have bundled both of them, explains what the two are, and lets you pick which you want. Actually I understand there is yet a third add-on which does bundle both of them, maybe that does give you this choice. I'm not about to install it to find out.

Three, the site itself could have made it easier to find the Firefox add-on. I couldn't believe how hard it was to locate the Firefox add-on, after I'd already created my account. This difficulty is how I ended up looking for add-ons through Firefox to begin with.

Thursday, February 15, 2007

Ada and Strong typing

This blast from the past, Why Programming Languages Matter by Lt. Col. John A. Hamilton Jr., U.S. Military Academy helped me remember some of the reasons I decided to like strong typing, in my formative programming years.

This article is a quick read, and has some cool pie charts of languages used within the US Department of Defense.

The choice words, for me, are found in the section "Reliability Counts":

Make no mistake, war is about killing people and destroying things. Military weapons systems are designed to be lethal and must be reliably controlled. Unreliable military software is frightening. We often hear of Ada's strong typing. An implicit-type conversion that results in a one-degree rounding error will, at a range of 40 kilometers, put ordinance 700 meters off target. In a close combat situation, a 700-meter error can result in friendly casualties. Reliability is important.

Just a few thoughts:

1. Wow: 700 meters. That puts programming in a vivid perspective: "hmm, am I forgetting something that would put high explosives 700 meters off target?"

2. The words "must be reliably controlled" leapt off the page, it seems to me that programming is nothing if not about control.

3. This application area seems to be one of the cases where the bureaucracy and removing-trust-from-the-programmer is a healthy double-check. I do wonder if these high-reliability systems tend to have very clear requirements up front, which could encourage top-down design. I get the sense that typeless languages help you in situations where it's more OK to proceed into code without a fuzzy idea of what you're trying to accomplish, and a rigid type system would really slow you down. I don't suppose it's 100% true, but I wonder if in weapons systems you have less of a problem with feature creep than with, let's say, web applications where static typechecking is often absent (Perl, Python, Ruby, with Java a notable exception).

Wednesday, February 07, 2007

XML-based Languages

Example Ant script
There's a thread on the Ant user list today which kicked off when someone expressed displeasure with the XML-based nature of Ant scripts:

Hans wrote:
What do you think about the XML format used for writing Ant scripts? I don't like it.

What about writing Ant scripts in a script language like Python or Jython instead of writing them in XML? I think it would be much more productive.

This is the topic I'd like to address: the choices of syntaxes for a language, and especially the choice of an unambiguous, easily-parsed, structured syntax like XML or Lisp, which I'll call the "model" syntax, versus a more conventional "pretty" syntax, or "front end" syntax. What is this choice? Let me explain.

When I first heard about XML, I thought it might serve a role in language implementations as a standard abstract syntax tree (AST) representation. There would be a separable parser component for the language, and the XML representation of the parsed form of the language would be standardized and published. This could make it easier to make tools like syntax-highlighting editors, because they could access the parser component. Automatic code generators could target the model syntax instead of the front end syntax because the model syntax has this unambiguous property. Why that you ask? Have you ever written a program that generates C source code using the minimum number of parentheses and braces? It's not easy. The easiest way to write this kind of auto-generated code is to wrap parens, braces, and other structural delimiters around every generated expression. Guess what? That's one of the defining characteristics of this "unambiguous structured syntax" I'm talking about.

This idea has another wrinkle: if there's a standard AST format in XML, you can write code in the language directly in the XML format, and bypass the pretty syntax altogether.

This Ant mailing list thread points out that Ant is an existing language with the model syntax, in search of a front end syntax.

You may wonder if it's kind of weird to have two syntaxes for the same language. Maybe it is. Maybe they could be considered separate languages in that case. But there is precedent for it. Mathematica has an ordinary programming syntax, and also something called FullForm. In the front end syntax, as I'm calling it, you can write an expression a^2 + b (c d + e), which has operator precedence and implicit multiplication going on. Then you can also view or write the FullForm representation of the same expression, Plus[Power[a,2],Times[b,Plus[Times[c,d],e]]]. This ability to call upon the FullForm view can give the programmer direct insights into the rules of the parser, in a way reminiscent of how WordPerfect's reveal codes function could sometimes explain mysteries of a document's formatting.

As we've mentioned, Ant has managed to grow and prosper with nothing but its cumbersome model syntax. Of course, one reason is that Ant is designed to be a build scripting environment, not a general purpose programming language. Ant scripts for 1000-file Java projects can be just a few hundred lines or less. The success of Ant though has given these scripts reason to grow, as more people make their build processes more elaborate.

So we've seen some of what the model syntax can give you. What does the front end syntax give you? It only takes a moment's thought to realize you don't want to write C code in XML, or worse yet, C++. (Any of my readers care to submit "ported" code examples? I'm sure we could have fun porting some Boost template stuff to an XML-ized C++ syntax.) In general, and without a shred of evidence, I'd venture to say the purpose of the conventional front end syntax is to provide an efficient means of expression that lines up with a considerable body of linguistic precedent including previous programming languages and notation from mathematics and other scientific studies. The problem is that this notation is usually awfully difficult to parse, and sometimes its more important to get some kind of computation going.

There are some things to learn from XML in particular when designing a front end syntax. In the Ant thread, Matt Benson offers that he is intending to design a "custom language for Ant". To this, Steve offers these thoughts about such a language design:

One problem I have with any DSL language (and that includes smartfrog) is remembering all the rules about whether to use colon or equals for assigment, whether you should terminate lines with semicolons or not, whether lists are allowed a trailing , at the end ["like","this",], what the unicode story is (or whether it just uses the local encoding), how to escape stuff, etc. All the little details that XML hands for you.

For all its ugliness (and if you think XML is bad, look at RDF-in-XML), it at least gets some things right
-tool neutral (though the way ant abuses XMLNS blurs this)
-good internationalization
-not as terse as perl

In mentioning things like ":=" vs. "=", I think Steve's comments point out the Tower of Babel effect in front end language syntax, where disagreement, a lack of convention, leads to confusion. (Speaking of Babel, notice that one of the areas lacking convention is internationalization.) Of course part of the fun of defining a pretty language syntax is expressing the world in your own favorite terms. It's making innovations like "let's use indentation for structural informational" and "let's make semicolons optional".

I think there's clearly a place for the front end syntax, the efficient way of expression, even if breaks with convention some. If people aren't having fun writing the syntax, they're not going to use it. But there's also a place for the structure and conventions of XML, or other structured model syntax.

This brings us back to the idea in the post's title, XML-based languages. It may be that in the future you follow Ant's path. You need to design some kind of language processor, and your first step is to define an XML format for it. You instantly can begin writing programs in the XML format, and can begin work on the interpreting or translating side. Maybe you'll write a compiler as an XSL file. Then once you get the breathing room, you can design your front end language and write the parser.

Or, perhaps, you'll just decide to hijack that ingenious syntax of parentheses of LISP/Scheme, the languages whose front end syntax is also a perfectly legitimate unambiguous tree-structured model syntax.

Friday, February 02, 2007

Matchfix: the obscure buddy of prefix and postfix

The title on this post says it all. This week I learned there is actually a name, "matchfix", for an operator that is called like a function. To put this in its familiar context, here's a summary of the ways a two-argument operator can be called:

op arg1 arg2

arg1 arg2 op (also known as Reverse Polish Notation)

arg1 op arg2


For reference, see the Google search 'matchfix -cricket -horse'.

It looks like the origins of "matchfix" may come from MACSYMA, the computer algebra system.

Now if the language you're designing or describing has a way to invoke operators like ordinary function calls, you know the term.