Homology search by e-mail is available at flat-netserv@smlab.eg.gunma-u.ac.jp

Sanzo Miyazawa smiyazaw at smlab.eg.gunma-u.ac.jp
Mon Jul 29 03:38:58 EST 1991





Pearson & Lipman's fasta local homology search by e-mail is now 
available as well as search/retrieval of database entries  at 
flat-netserv at smlab.eg.gunma-u.ac.jp.

See below for details.



---------------------------------------------------------------------------

		FLAT DB E-Mail Network Server

					Sanzo Miyazawa
					Gunma Univ., Faculty of Technology
					FAX:   +81 277 40 1026
					Phone: +81 277 22 3181 ext. 262

E-mail address for the server:	flat-netserv at smlab.eg.gunma-u.ac.jp
E-mail address for inquiries:	sanzo.miyazawa at smlab.eg.gunma-u.ac.jp
				   or smiyazaw at smlab.eg.gunma-u.ac.jp

Following commands are available:
  - Commands must be written not at the "Subject:" field but in a mail. 
  - Command names and others are case sensitive, unless specified.
  - Output may be limited to 2400 lines.
  - Character strings may be represented in the regular expression.


man command
	Output a manual for the command; only available for some commands.


		1. Search/Retrieval Commands

scandir db-name [options] 'keyword[|keyword...]' ['keyword[|keyword...]'] ...
	Scan directory files of the "db-name" database to find "keywords"
	and output entry names and their definitions; keywords should be
	expressed in the regular expression, that is,
		key-1|key-2 key3	means "(key-1 or key-2) and key-3"
	Options:
	-i	case insensitive

scanjou db-name ['journal'] ['vol:'['page[-page]']] ['(year)']
	Scan journal index files of the "db-name" database to find specified
	journals and output journal names and corresponding entry names.
	Journal names in the command line are not case sensitive.

scanaut db-name 'Last-name,[First.Middle-Initial.]' ...
	Scan author index files of the "db-name" database to find specified
	author names and output author names and corresponding entry names
	with their definitions.
	Author names in the command line are not case sensitive.

scanacc db-name '#acc' ...
	Scan accession number index files of the "db-name" database to find
	specified accession numbers and output corresponding entry names
	with their definitions.

scandb db-name [-1] [-o] {['entry'...]|[-a '#acc'...]}
	Scan the "db-name" database to find specified entries or accession
	numbers and output those entries.
	Options:
	-1	'Entry' or '#acc' may specify multiple entries in the DB.
	-o	The order of arguments is not significant; the order of entries
		output may not be in the order specified in the command line.

  Available databases which may be specified in scan commands:
    db-name = gb | embl | ddbj | gp | swiss | pir | prf

	gb or genbank:	GenBank DNA database
			Regular release and new entries which are updated
			twice a day.
	embl:		EMBL DNA database; regular release + new entries
	ddbj:		DDBJ DNA database; regular release + new entries
			It is included in the GenBank and EMBL DBs.
	gp or genpept:	GenBank Gene Product Database;
			protein database translated from GenBank DNA database
	swiss:		SwissProt protein database
	pir:		PIR protein database
	prf:		Protein Research Foundation peptide database

	Command names and others are case sensitive, unless specified.
	Output may be limited to 2400 lines.

  Examples:
    Commands must be written not at the "Subject:" field but in a mail. 
    "|" is not ";" but "bar".

	scandir gb -i 'oncogene' 'human'		# oncogene and human
	scanjou	gb 'J. Biochem.' '107:316-323' '(1990)'	# case insensitive
	scanjou	gb 'J. Biochem.' '107:' 		# vol. 107 
	scanjou	gb 'J. Biochem.' '(1990)'		# 1990 issues
	scanjou	gb '(1990)'				# all 1990 issues
	scanaut	gb 'Miyazawa,S.'			# case insensitive
	scanacc gb 'M11391' 'd00611'			# case insensitive
	scandb gb 'AGMERLTR1' 'musbas'			# entries
	scandb gb -a 'M11391' 'd00611'			# accession numbers
	scandb gb 'ECO.*'				# all ECO.*

	scandir gb -i ' e.*coli' | scandb gb		# try to collect E. coli
	scanjou gb 'J. Biochem,' '(1991)' | scandb gb


		2.  Commands for Homology Search

tmpfile filename
	Create a temporal file named "filename"; this must be used such as
		tmpfile seq-1 <<'*** END ***'
		...
		...
		*** END ***
	In the example above, "seq-1" includes lines just before '*** END ***'. 

		scandb gb hcemle | tmpfile hcemle

	In this case, the entry HCEMLE is retrieved into a file named hcemle.

	Sequence file formats which are supported in the following commands are
		GenBank, EMBL, PIR, SwissProt, PRF formats
	and also simple format shown below.
		> title		# Title; mandatory
		....		# Sequence in one letter representation;
				# case insensitive; numbers are ignored.
		//		# This line is optional.

fasta	[-o #scores_to_be_printed ] [ -c cutoff ] test_seq. database [ktup]
	Fasta (v.1.3) search of Pearson & Lipman for local homology.
	  Ex.	fasta -o 40 -c 1 test_seq. $GB/gbpri.seq 3
	See manual; man fasta
	See	Pearson, W. R. and Lipman, D.  J.   "Improved  Tools  for
		Biological Sequence Analysis", Proc. Natl. Acad. Sci. USA
		85:2444-2448 (1988).

tfasta	[-o #scores_to_be_printed ] [ -c cutoff ] test_seq. database [ktup]
	Fasta (v.1.3) search of Pearson & Lipman by comparing test_seq. of
	amino acids with a DNA database translated into amino acid sequences.
	  Ex.	fasta -o 40 -c 1 test_seq. $EMBL/emblman.seq 1
	See manual; man tfasta

lfasta	[ -c cutoff ] test_seq. target_seq. [ktup]
	Local homology search of Pearson & Lipman.
	  Ex.	fasta -o 40 -c 1 test_seq. target_seq. 1
	See manual; man lfasta

RDF2	[ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle]
	Evaluate statistical significance of sequence matching;
	modified Pearson & Lipman's rdf2 with lfasta alignment.
	See manual; man rdf2
	
RDF2G	[ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle]
	RDF2 with local shuffle; modified Pearson & Lipman's rdf2g.

RDF2W	[ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle] [window_size]
	RDF2 with optimal score calculated by  using  a  global
	alignment routine; modified Pearson & Lipman's rdf2.

RDF2WG	[ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle] [window_size]
	RDF2 with local shuffle and optimal score calculated by
	using a global alignment routine.


  Databases which can be specified in "database" arguments above:
    @gb or @genbank	All sequence files of GenBank including new entries
    @embl		All sequence files of EMBL including new entries
    @ddbj		All sequence files of DDBJ including new entries
    @pir		All sequence files of PIR
    @swiss		All sequence files of SwissProt
    @prf		All sequence files of PRF
    @gp or @genpept	All sequence files of GenPept
   or a sequence file of each taxonomical division:
    $GB/gbbct.seq, gbinv.seq, gbmam.seq, gborg.seq, gbphg.seq, gbpln.seq,
        gbpri.seq, gbrna.seq, gbrod.seq, gbsyn.seq, gbuna.seq, gbvrl.seq,
        gbvrt.seq
    $GBNEW/gbnew.seq
    $EMBL/emblfun.seq, emblinv.seq, emblmam.seq, emblorg.seq, emblphg.seq,
	  emblpln.seq, emblpri.seq, emblpro.seq, emblrod.seq, emblsyn.seq,
	  embluna.seq, emblvrl.seq, emblvrt.seq,
    $EMBLNEW/emblnew.seq
    $DDBJ/ddbj.seq
    $DDBJNEW/ddbjnew.seq
    $GENPEPT/gp.seq
    $PIR/pir1.seq, pir2.seq, pir3.seq
    $SWISS/swiss.prot
    $PRF/prf.seq

    For details, use "set" commands to see environmental variables defined.

  Examples: Multiple commands may be included in a mail.

	tmpfile seq-1 <<'*** END ***'		# create seq-1 file
	> seq-1
	atcg ATCG gcta
 	*** END ***
	fasta -o 60 seq-1 $GB/gbpri.seq 3 
	fasta -o 60 seq-1 $GB/gbmam.seq 3

	tmpfile db <<"*** END ***"		# double quotation in this case 
	$EMBL/emblpri.seq
	$EMBL/emblmam.seq
	$EMBL/emblrod.seq
	$EMBL/emblvrt.seq
	*** END ***
	fasta -o 40 seq-1 @db 6		# search over files written in "db"
	
	tmpfile aa_seq	<<'*** END ***'
	> aa_seq
	 1 G D V E K G K K I F I M K C S Q C H T V E K G G K H K T G P
        31 N L H G L F G R K T G
	*** END ***
	fasta -o 20 -c 1 aa_seq @pir 1		# pir is the predefined file.
	tfasta -o 20 -c 1 aa_seq @embl 1	# embl is the predefined file.
	tfasta -o 20 -c 1 aa_seq $GB/gbbct.seq 1

	scandb pir ccsp | tmpfile ccsp.seq
	scandb pir CCTW5T | tmpfile cctw5t.seq
	RDF2 -c 1 ccsp.seq cctw5t.seq 1 100 

------------

	However, please note that fasta search over a whole DNA database
	takes a lot of time.



More information about the Bioforum mailing list