I'm trying to find a definition of Linnean Name sufficently tight to form 
part of a computer program specification.

I want to be able to recognise and parse names like "Arthropoda" or 
"Ischnura elegans f. violacea".

Below is what I've come up with so far, in a kind of simplified 
Bachus-Naur (sp?) notation. There are probably some mistakes, and 
certainly some unanswered questions.
(This definition excludes abbreviated taxon names, like the "E." in E. 


Linnean Name == HighRankingTaxon [& " " & SubTaxon]

Taxon = HighRankingTaxon |

HighRankingTaxon == CapitalLetter [& LowerCaseString]

SubTaxon == ElaborateTaxon [& " " & SubTaxon]

ElaborateTaxon == LowRankingTaxon |
                  "(" & Taxon & ")" |
                  RankIndicator & " " & LowRankingTaxon 

RankIndicator == "f." |
                 "form" [& "a"] |
                 "var." |
                 "ssp." |
                 "race" |
                 <lots of others - what are they?>

LowRankingTaxon == [NumericString & "-" & ] [LowercaseString & "-" & ] 


Any taxon with a rank of species or below is a LowRankingTaxon, and thus 
written all in lower case.
Any taxon with a rank of genus or above is a HighRankingTaxon, written 
with an initial upper case letter.
All HighRankingTaxa have unique names.


1) Can you understand this?
2) Is it correct?
3) What other rank indicators are used?
4) Under what circumstances is a taxon put in parentheses?
5) Should a subgenus, or any other taxon between genus & species, be 
written with a capital or lowercase initial?
6) Can any HighRankingTaxon start with a non-alpha character?
7) Do any taxa contain characters other than numbers, "-" & letters?
8) Have I used the word "taxon" correctly, to refer to the *name* of an 
organism (or group), rather than the organism itself? If not, could you 
suggest better terminology?

