DOS program MARKER.UUE

Foteos Macrides MACRIDES at WFEB2.BITNET
Sun Nov 17 20:57:00 EST 1991


This note concerns Dan Davison's and Rob Harper's recent postings about the
Quixotic campaign (8-) to deal with corruptions of carats and tildes during
transfers of files in the direction *from* Internet *to* BITNET *at* INTERBIT
gateways.  This is the *only* condition for which the corruption occurs.  If
it happens in the course of your getting EMBL-UUENCODEd files, and you can't
decode them, get the *LATEST VERSION* of the EMBL-UUDECODEr and it *will*
decode the corrupted files without problems.  Skip the rest of this message if
that's the only problem of concern to you.

The following does concern you if this character: ^
        appears on your screen as a lowercase Greek pi, or as a lowercase 'a'
        with an uhmlaut (sp?) over it, or as something weird, instead of as a
        carat ("hat"),
and if this character: ~
        appears as a carat instead of as a tilde ("not" symbol).

****Re: Dan's message:
>Foteos Macrides said:
>> >       So why couldn't I decode it until after I did an E3 -> 5E
>> > global search and replace?  Note that carats are not corrupted
>> > when crossing a gateway in the BITNET -> Internet direction, or
>> > when travelling entirely down a BITNET chain or entirely via
>> > Internet from EMBL, only when going across from Internet to
>> > BITNET.
>
>I bet the Internet->BITNET corruption is gateway dependent. That's one
>reason the Gene-Server deposits its bits directly onto BITNET courtesy
>of some serious magic here at the UH central mail machine.

        I thought I had the latest version of the EMBL-UUDECODEr, but didn't.
That's why I couldn't decode it with the corruptions present (I sent a follow
up message about this, which apparently didn't get to you).

        The magic at the UH central mail machine is simply its use of PMDF to
place outbound mail directly on BITNET or Internet (depending of the domain
field of the address), thereby *avoiding* the need to cross an *INTERBIT*
gateway.  It turns out that Rob's central mail machine also uses this
mechanisms for outbound wide-area network mail, is directly connected to
*both* BITNET *and* Internet, and therefore *also* avoids crossings of its
outbound mail through INTERBIT gateways.  Sites which are using MX instead of
PMDF *and* have direct BITNET and Internet connections also avoid transfer
through the INTERBIT gateways and will not have this problem.  I've been told
that GIVEME also works like PMDF and MX, but I don't know anything about that
wide-area network mailer.

>> Your server is on the Internet
>
>Umm, kinda. See above. It's transported via SMTP in ASCII to the UH
>mail gateway which sits on both the Internet and BITNET. From there it
>is transported by the most direct route unless the user specified
>something else, like foo at bar.bitnet@cunyvm.cuny.edu, which forwards it
>to CUNY for delivery to BITNET.  That's my current understanding,
>anyway.
>
>> so it should also get UUENCODEd files from EMBL without corruption.
>
>Yes, this has been checked repeatedly.

        Again, this is a function of the mailing software that is installed at
UH (and EMBL), not local modifications of it.  EMBL presently is directly
connected to *both* BITNET and Internet, and similarly sends out its stuff in
a way which avoids crossing an INTERBIT gateway.

        The important point here, which at risk of long-windedness I want to
be sure gets across, was w.r.t. your *incoming* mail at UH.  When EMBL becomes
Internet-only next month, you should make *sure* it continues its Email
mirroring from EMBL to an *Internet* address (or that you use an FTP/TELNET
mechanism).  Else, the files will cross an *INTERBIT* gateway in the direction
*from* Internet *to* BITNET, and the carats and tilde's will start being
corrupted.

        Genbank.bio.net is presently Internet-only.  It cannot send Email to
its BITNET subscribers, nor to the BITNET LISTSERVers for relay to their
BIOSCI subscribers, without crossing an INTERBIT gateway in the direction that
corrupts carats and tildes.  I'm sending this to BIO-SOFT at genbank.bio.net, so
my test at the top will have the corruptions for these subscribers.

        And again (and again, and again 8-), when EMBL becomes Internet-only,
it *also* (at least "in theory") will have *no way* to send Email to BITNET
addresses without crossing a wide-area INTERBIT gateway in the direction that
corrupts carats and tildes.

>> Rainer exchanged the "historic" file with Rob via
>> Internet, and it was posted to software/BIO-SOFT via genbank.bio.net, so no
>> corruption would be expected to that point.  Internet nodes which SHOULD have
>> experienced a corruption are those subscribed to BIO+SOFT via
>> LISTSERV at IRLEARN.BITNET, because the posting had to cross from Internet to
>> BITNET (at STANFORD or CUNY) going from genbank.bio.net to the LISTERVer.  Is
>> that so, and could subscribers on such nodes decode the file without doing an
>> E3 -> 5E global search and replace?
>
>Ah, you're close to something here. Someone somewhere in this latter
>chain is doing a double EBCDIC-ASCII conversion, maybe. One gateway
>does the 1966 version, another the later.  But that still doesn't
>explain how the *table* got changed differently from the *body* of the
>uuencoded file.

        Hope the info higher up clarifies this.  The table didn't get
corrupted differently from the body of the uuencoded file.  I simply tried to
decode with what was not, in fact, a current version of Rainer's decoder.  My
analysis of where the corruptions should have happened and who should have
received the corrupted file was completely accurate (and my test at the top
should confirm this again).

>>         I also just tried getting an EMBL .UUE file via Email from your
>> gene-server, and the carats were not corrupted, but you're using
>> PMDF to put it locally onto a BITNET chain (i.e., it didn't cross an
>> external Internet -> BITNET gateway getting TO your server from
>> EMBL, and it similarly didn't cross one to get here FROM your
>> server), so that didn't turn out to be a test of the hypothesis
>
>But you have nicely figured out why the Gene-Server talks to the
>BITNET so well...
>
>> (it's the EBCDIC/ASCII mapping at the external gateways that
>> appears to have a bug, which has existed for so long that it's accepted as a
>> "characteristic" of the gateways).
>
>I just said that. Oh. You said it first, I should read the whole
>message, sorry.

        Hopefully, it's been said enough times now that all of us who need to
understand this really do.  It's actually simple, once you do understand it,
but it took me four months, cuz I'm just the WFEB's closest approximation to a
computer whizz -- *far* from a real one!!!

>Now, Fote, Rob, Rainer, Don, Dave, any ideas of how to test this
>electronically? I could get a disk from Fote with the suspect files, I
>suppose. I would really like to track this down.

        Rob and I did some additional testing.  Time to switch to his message.

****Re: Rob's message:
>*>Now, Fote, Rob, Rainer, Don, Dave, any ideas of how to test this
>*>electronically? I could get a disk from Fote with the suspect files, I
>*>suppose. I would really like to track this down.
>
>        There are other ways of encoding files "atob" has been tried
>        but I have heard that it gags on certain systems. The RED
>        TRICKLE servers use both UUE and UXX encoding. I have used both
>        and they have not given any trouble. Indeed the UXX code is highly
>        recommended for transfering files to the UK. With UXX there are
>        no obscure characters used like tilde or carat that get trashed
>        at gateways. The trouble is that we have "educated" biologists
>        to use the standard "EMBL-UUE" tools, and any new coding system
>        might be difficult to introduce.
>
>        I could easily see people trying to use UUE to decode UXX files
>        and then complaining when things did not work.
>
>        As a small experiment I sent a UXX file to Fotes, and did not give
>        him a hint as to what it was, and did not provide him with any tools
>        to decode the file with. I was surprised that within a couple of days
>        he had cracked it, starting from scratch. I expect he cracked it
>        since he was interested in the subject, but I wonder how many other
>        scientists would go to the trouble of hunting done the tools to get
>        the job done.
>
>        Rainer has done some good work to produce a decoder that works very
>        well over the network, and since it is the "standard" I think we
>        should stick with that. However in the MSDOS world Richard Marks
>        has a very nice decoder that will do either UUE or UXX.
>
>        I think I have the source code for UXX on a VAX, and I have
>        promised to send it Fotes... if he as the time and the energy
>        then perhaps he could incorporate it into the EMBL.src to make
>        both options available.

        I "solved" Rob's test by giving up on our BITNET-only node and using a
modem to get on a colleague's account at another institution which is on the
Internet.  There, I asked Archie for the locations of XX decoders and had 29
hits, but none of them were for VMS or MACs.  I got an MSDOS/Unix version and
did a "quicky" port to VMS (not in a way which preserves portability).

        Once I got the "hang" of it, I looked at Rainer's decoder.  It would
be relatively "easy" to add a routine for determining whether a file is UU or
XX encoded, and then to do the appropriate decoding, while preserving the
portabilily to all the major platforms.  Users would then not even need to
know, let alone "think" about, the nature of the encoding.

        Adding a qualifier and routines to Rainer's encoder so that it
optionally does UU *or* XX encoding doesn't look terribly difficult either.
Of course, you never know how hard something is until you actually try to do
it, so I put it on my list of things to try during my so-called Christmas
vacation.

        There is no urgency about this for EMBL-UUENODEd files, if one makes
sure one has the current version of the EMBL-UUDECODEr.

        The problem will only occur if files which have carats or tildes
(e.g., as Unix shells and AWK/GAWK scripts might) are Emailed through an
INTERBIT gateway in the direction from Internet to BITNET *without* having
encoded them with the EMBL-UUENCODEr.  Under the latter circumstances (which
I've



More information about the Bio-soft mailing list