sequence analysis software

frist at ccu.umanitoba.ca frist at ccu.umanitoba.ca
Fri Jun 19 16:37:11 EST 1992


Since there has been discussion on bionet.software on the merits and
shortcommings of different sequence analysis packages, and whether to go
with PC's or workstations,  I thought I'd share my own experience, 
particularly for those who need a low cost system. 
------------------------------------------------------------------------
                               B  I  R  C  H

BIRCH (BIological Research Computing Hierarchy) is a collection of sequence
analysis tools and databases installed on the Sun/Unix system at the
University of Manitoba. All software and databases are available at no
charge by FTP over the Internet. To help the user  'put it all together' I
have written the BIRCH USER'S GUIDE, which is described below. Even though
much of the GUIDE contains information specific to our own site, it
could serve as a useful starting point for setting up a comparable system
at your institution. The PostScript file for the GUIDE (birch.ps.Z) 
is available, along with the FSAP and XYLEM packages, by anonymous FTP to
ccu.umanitoba.ca in the directory pub/psgendb.

WHAT BIRCH HAS
==============

Databases: GenBank, PIR, VecBase, LiMB
Software Packages:
  FSAP: general sequence analysis
  Fasta: database searches and more
  XYLEM: sequence database management and manipulation
  MBCRR: MASE sequence editor, similarity searches
  PIMA,PLSEARCH: Pattern-induced multiple alignment
  CLUSTALV: Multiple alignment
  READSEQ: translates sequence formats
  PRIMER: primer design software
  PHYLIP: phylogeny construction

  and more, as the need arises

WHAT BIRCH DOESN'T HAVE
=======================
Up to now, we have concentrated on line-mode programs, because they have
virtually no special hardware requirements. At present, users would still
have to use their PC's for graphic-oriented tasks, such as drawing plasmid
maps.

The increasing availability of X-windows- and PostScript-  compatible
devices makes it more realistic to provide users with a broader
range of software, without having to write a device driver for every
device.  Our first step in this direction is the integration of 
most of the BIRCH programs into the GDE (Genetic Data Enviroment) of 
Steven Smith.  Since GDE already incorporates more than half of the
programs mentioned above, this is not a big problem. The larger problem
now is convincing people to get network connections and X-terminals.

WHAT IT TAKES TO SET IT UP 
==================================

1. Sun/Unix system, with probably at least 300Mb of free disk space
   (Most of the software described here will work in other Unix systems)

2. Internet/FTP access

3  Somebody knowledgeable in computers who is willing to put in the time
   to install programs and databases and to obtain updates as they become
   available. The initial installation is the tough part, although many
   packages have makefiles that automate the process. Often, all you have
   to do is to set environment variables or paths in header files.

WHAT THE USER NEEDS TO HAVE
===========================

1. A networked terminal - Since everything works in linemode, hardware
   compatibility is not a problem. Basically any VT100 or similar terminal
   or terminal emulator on a PC can be used.

2. A little bit of time. The BIRCH Manual defines for the User a 'minimal
   subset' of Unix knowledge to be able to work in the system. This subset
   consists of:

   a core of about 15 Unix commands
   a text editor (eg. vi, emacs)
   a mailer
   a newsreader

That's it. For the simple investment of a couple of hours to read a 
high-school-level book on Unix, the user can do virtually everything
he/she needs to do.

It is nice that many packages have user-friendly menu-driven systems.
I'm all for it whenever possible. The main problem is that NO ONE
PACKAGE, no matter how good or comprehensive, will do everything
you need, to do, or do every task that it does do well. (GDE, by making it
easy to plug in new programs, is one answer to this dilemma.)

Given that, there is always a need to bring in additional programs and
data as they become available. Once you bring in anything beyond your one
grand, do-all package, the user is right back where he/she started, and has
to learn something about the operating system. People who wish to use
computers are going to have to resign themselves to learning a small number
of commands. The good news is that, once you learn this set, you can get
by with very little else! (ie. you don't have to become a computer guru)

THE BIRCH USER'S GUIDE: A ROAD MAP TO THE SYSTEM
================================================
Recognizing that the user is faced with learning both an operating system
and programs, I have written the BIRCH USER'S GUIDE. The table of contents
is shown below illustrates the topical organization of the GUIDE:

                                  TABLE OF CONTENTS


INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

I.  A TOUR THROUGH BIRCH. . . . . . . . . . . . . . . . . . . . . . . . . .   3
      I.1  Hierarchical directory structure . . . . . . . . . . . . . . . .   3
      I.2  Managing a sequencing project. . . . . . . . . . . . . . . . . .   4
      I.3  General analysis: FSAP . . . . . . . . . . . . . . . . . . . . .   4
      I.4  Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . .   6
            I.4.1  Description of databases . . . . . . . . . . . . . . . .   6
            I.4.2  Searching for and retrieving data from databases . . . .   6
            I.4.3  Database subset management: XYLEM. . . . . . . . . . . .   7
      I.5  Phylogeny construction . . . . . . . . . . . . . . . . . . . . .   8
      I.6  Network resources. . . . . . . . . . . . . . . . . . . . . . . .   8

II.  GETTING STARTED. . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
      II.1  What is an operating system, and why do we need to know how to use
            it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
      II.2  What you need to learn. . . . . . . . . . . . . . . . . . . . .  11
            II.2.1  The core commands . . . . . . . . . . . . . . . . . . .  11
            II.2.2  The vi text editor. . . . . . . . . . . . . . . . . . .  12
            II.2.3  File organization . . . . . . . . . . . . . . . . . . .  13
      II.3  Connecting to the system. . . . . . . . . . . . . . . . . . . .  14
            II.3.1  Ethernet connection . . . . . . . . . . . . . . . . . .  14
            II.3.2  Connecting to UMNET by modem or wire. . . . . . . . . .  15
            II.3.3  Campus network resources. . . . . . . . . . . . . . . .  16
      II.4  Setting up your account . . . . . . . . . . . . . . . . . . . .  17

III.  HOW TO DO THINGS. . . . . . . . . . . . . . . . . . . . . . . . . . .  19
      III.1  General guidelines . . . . . . . . . . . . . . . . . . . . . .  19
            III.1.1  Environment variables: shorthand ways of specifying directories  19
            III.1.2  How to find a program. . . . . . . . . . . . . . . . .  19
            III.1.3  Finding and printing documentation . . . . . . . . . .  19
            III.1.5  File formats . . . . . . . . . . . . . . . . . . . . .  21
            III.1.4  Executing programs . . . . . . . . . . . . . . . . . .  
            III.1.6  Citing programs and data in publications . . . . . . .  22
      III.2  Finding and retrieving sequences . . . . . . . . . . . . . . .  
            III.2.1  Retrieving sequences by name or accession number . . . .22 
            III.2.2  Keyword searches . . . . . . . . . . . . . . . . . . .  23
            III.2.3  What if fetch can't find your sequence?. . . . . . . .  25
            III.2.4  Creating your own database subsets . . . . . . . . . .  26
      III.3  Searching databases. . . . . . . . . . . . . . . . . . . . . .  26
            III.3.1  General guidelines for database searching. . . . . . .  26
            III.3.2  Automated searches of GenBank: search and dsearch. . .  27
            III.3.3  Direct use of fasta, tfasta and ssearch lets you customize your
                  search parameters . . . . . . . . . . . . . . . . . . . .  28
            III.3.4Evaluating the significance of sequence similarities . .  29
      III.4  Extracting and Manipulating Sequence Features. . . . . . . . .  30
            III.4.1  The DDBJ/EMBL/GenBank FEATURES Language. . . . . . . .  30
            III.4.2  features: a menu-driven user-interface to getob. . . .  31
            III.4.3  Extracting a Feature from a GenBank Entry. . . . . . .  34
            III.4.4  Extracting One or More Features from Groups of Entries  35
            III.4.5  Ext



More information about the Bio-soft mailing list