RESPONSES TO NAASC LETTER

Joe Ecker jecker at ATGENOME.BIO.UPENN.EDU
Tue May 31 16:38:08 EST 1994


Dear Colleagues,

Following the model of C. Somerville, and in order to stimulate further
discussion, I provide below a list of responses to the NAASC letter 
requesting your input about genome-related Arabidopsis research. 

For those of you that feel unconfortable with directly posting your 
comments to the network, please direct your response to me and indicate that 
you want to post an "anonymous" response. I'll strip off the header
and post it to the network (in batches).

Regards,

Joe Ecker



|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

RESPONSE # 1

>1. Do you think that we are ready to begin some level of directed genome 
>sequencing in the US?  AFTER 'FINISHING' CDNAS
>
>2. How important is genome sequencing in terms of funding priorities (vs. 
>placing cDNAs on the map, completion of the physical map, adding more 
>PCR-based markers to the map, etc.)?  PLACING CDNAS AND COMPLETING THE
>PHYSICAL MAP SHOULD BE COMPLETED FIRST
>
>3. Who should support systematic genome sequencing if it is a big-$ effort? 
>USDA
>
>4. What impact on Arabidopsis research will be incurred if sequencing does not
>begin today (in 2 years; in 5 years,  in 10 years)?  IT NEEDS TO BE BEGUN W/IN
>3-5 YEARS
>
>5. What type of organizational model for genome sequencing would you support: 
>sequencing centers vs. individual interested labs?  I THINK THAT THERE SHOULD
>BE ROOM FOR BOTH.
>                        
>6. What quality standards would you expect for the sequence: high or low  
>accuracy (high accuracy = higher cost)? FAIRLY HIGH, UNLESS THE COMPUTER
>PROGRAMS USED FOR ANALYSIS ARE ABLE TO MAKE-UP FOR THE HIGHER LEVEL OF ERRORS
>                                                                              
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

RESPONSE # 2  

I don't think a good case can be made for directed 
sequencing yet.  Higher priority should be a higher 
density genetic map (comprised of mostly ESTs and 
PCR-able SSLPs) and "completion" of the physical map.  
Directed sequencing, if attempted, should definitely 
*not* be done like the European yeast project, where it's 
apparently just a way to get a bunch of money for your 
lab without actually having to sequence very much (e.g. 
the 50 or so labs that were required to generate the mere 
300kb of chromosome III).  The worm guys seem to have 
been doing things pretty close to "right" all along.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

RESPONSE #3


It seems unlikely that the stated goal of complete sequence by the year 2000
will be reached.  The sequence of yeast will most likely be finished by 1997,
and, given that Arabidopsis is at least 5 times larger, and that little
progress has been made thus far, it seems that revision of the target date is
in order.

1.  Presumably the community is interested in obtaining the DNA sequence
because that is one avenue to determining a gene's function.  However, if we
invest all our resources in determining sequence, we will have the sequence of
the genes (open reading frames), but we won't know what they do. I would favor
a combination of approaches that simultaneously improve our ability to assess
gene function and that add to the sequence data base.  As technology improves,
sequencing will become more cost effective.  Perhaps, with the limited
resources available, it is best at this time to delay an all-out effort to
sequence the genome until advances in technology make sequencing more
affordable.

2.  Mutations are an important and powerful tool that play a key role in
understanding gene functions.  Placing those mutations on a map is an
effective method for determining the number of genes involved, an for
performing additional genetic analysis such as analyzing suppressors.  In
addition, a high resolution genetic map allows map-based cloning.

I would like to see methods developed that would allow a researcher to map a
mutation (in a few day's work) to a small region corresponding to a single
clone (perhaps a YAC that has been placed on the map).  Thus, in a few days
one could move from a genetic defect to a piece of DNA.  A map of 1 cM
resolution would correspond to a marker approximately every 200kb, which is
about the size of the YACs.  To do this, we need to simultaneously do two
things:  1)  Get a complete physical map 2) Get a high-resolution and easy to
use genetic map - there is some debate over which kind of marker would be most
useful, but general agreement that PCR is faster than Southerns.

If, as an additional part of this, each cDNA that is contained on each YAC
clone had also been identified (and, HOPEFULLY, those cDNA were on clones with
Agrobacterium borders) one could easily move from a genetic defect to a
complementing cDNA.  

Finally, I would like to see (perhaps through a commercial supplier) a blot
with all of the overlapping YAC clones that correspond to a complete physical
map.  This would allow those who have cloned a gene by homology (using PCR,
etc) to hybridize their gene to the blot, and thus place the gene on the map. 
In some cases, these genes will correspond to existing mutations.  Thus, it is
important to have dense PHYSICAL and GENETIC maps.

Because sequencing would be more expensive to develop than the technologies
described above, I would prefer that we obtain these things first, and then do
large-scale sequencing later.

3.  Can NSF support this?  It is a big $ effort - no question about it.  In
fact, cost estimates for this could be obtained from NIH and other
organizations currently funding genome projects.  When the NIH genome effort
was established, assurance was given that the genome money would not be drawn
from pools that fund basic research.  Could NSF offer the same deal? or is
this an either/or situation?

4.  Right now, we probably waste some money by sequencing genes in individual
labs (perhaps some numbers could be obtained - how many person hours per gene,
cost of supplies, number of genes being sequenced per year).  If the genome
were sequenced, this expense would be eliminated.  As time goes on, the cost
of this piece-meal sequencing will increase.  So, we should sequence the
genome as soon as we can, BUT, not at the expense of providing the
infrastructure that will allow us to assess function (ie genetic, physical
maps as described above).

5.  I would strongly support doing this as cheaply as possible.  Because
sequencing technology is relying more and more on automation, the only cost
effective way to do this is through a sequencing center.  It is better if
individual labs are supported to do basic research and to tackle biological
problems than to be bogged down in production sequencing.

6.  We need an accuracy high enough that we don't miss a significant number of
open reading frames.  Given that our resources are probably more limited than
in the other model systems, I don't think we should push for a higher accuracy
than they have, and could probably settle for a lower one (90-95%??).  Also,
because of the introns, we will need to have some information in order to
understand the sequence we get out of this project.  Thus, before any major
sequencing effort, we should have the cDNAs mapped to physical clones.

7.  Realistically, what are the odds of getting an agency other than NSF to
fund this?  Could we make a case with NIH?  Are the funds available from NSF??
 How about DOE or USDA?  If funds could be obtained to both sequence the
genome, and to obtain physical maps, genetic maps, cDNA maps, etc, that would
be great!  Perhaps the steering committee could contact other funding agencies
to make a case for sequencing Arabidopsis??  Any efforts along those lines
would be much appreciated by the community as a whole.  

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

RESPONSE # 4

In article <9405311819.AA01254 at atgenome.bio.upenn.edu> you write:
>Plan for the Multinational Coordinated Arabidopsis thaliana Genome Research 
>Project" (NSF document 90-80),  the mission of this project is to "identify 
>all of the genes by any means and to determine the complete sequence of the 
>Arabidopsis thaliana genome before the year 2000". 

This seems quite over ambitious. That means that over the next five
years there needs to be an average of 20MB sequenced per year.  Thats
almost two Saccharomyces genomes a year, note that S.c. is not
expected to be completed for another two years. To make an average of
20MB/year would have to be a really big money project. Ask NCHGR what
they spend on Waterston's sequencing center per year and multiply that
by at least twenty.

>1. Do you think that we are ready to begin some level of directed genome 
>sequencing in the US?  

I think there should be some genome sequencing. However what is needed
is more technology development not necessarily more production
sequencing.

>2. How important is genome sequencing in terms of funding priorities (vs. 
>placing cDNAs on the map, completion of the physical map, adding more 
>PCR-based markers to the map, etc.)?

Completion of the physical map should be a very high priority. Placing
cDNAs on the map and adding more markers second. Directed genome
sequencing of interesting areas would be third.

>3. Who should support systematic genome sequencing if it is a big-$ effort?

This is a political question.

>4. What impact on Arabidopsis research will be incurred if sequencing does not
>begin today (in 2 years; in 5 years,  in 10 years)?

If a massive mapping of cDNAs starts then the lack of genomic
sequencing will be small. The physical map is also very important, as
physical map is needed for genome sequencing. Because money is not
apparently readily available for genome sequencing of Arabidopsis in
the US the question is moot.

>5. What type of organizational model for genome sequencing would you support: 
>sequencing centers vs. individual interested labs?

Both, but it depends on the scale. I do not believe a 20MB/year rate
can be done by a collaboration of labs like the EU yeast project.

>6. What quality standards would you expect for the sequence: high or low  
>accuracy (high accuracy = higher cost)?

Do the cDNAs at high quality, and then the genome sequence at low quality.

>7. ANY SPECIFIC OR GENERAL COMMENTS THAT YOU WOULD LIKE TO MAKE!

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


RESPONSE #5

1. Do you think that we are ready to begin some level of directed genome 
sequencing in the US?  

No.  Do detailed map first (linked YACs, SSLPs, CAPS etc.)

2. How important is genome sequencing in terms of funding priorities (vs. 
placing cDNAs on the map, completion of the physical map, adding more 
PCR-based markers to the map, etc.)?

more important than mapping cDNAs
less important than completion of physical map (#1 priority) and adding more
PCR-based markers

3. Who should support systematic genome sequencing if it is a big-$ effort?

Fed budget line item

4. What impact on Arabidopsis research will be incurred if sequencing does
not
begin today (in 2 years; in 5 years,  in 10 years)?

if not today:  none
if not in 2 years:  serious disadvantage towards other organisms
if not in 5 years:  we should work on some other organism then


5. What type of organizational model for genome sequencing would you support:

sequencing centers vs. individual interested labs?

sequencing centers
                        
6. What quality standards would you expect for the sequence: high or low  
accuracy (high accuracy = higher cost)?

high accuracy
                        			                                      
7. ANY SPECIFIC OR GENERAL COMMENTS THAT YOU WOULD LIKE TO MAKE!

I find genomic sequencing much more important than cDNA sequencing.  First,
the complete genomic sequence would be the ultimate map; second, to search
for sequence similarities should be fine with genomic sequence, even without
sophisticated coding sequence finders.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

RESPONSE # 6

I do not think that a largescale genome-sequencing project for Arabidopsis is 
appropriate at the moment. First, Arabidopsis does not have a sophisticated 
genetic infrastructure comparable to either Yeast or C. elegans (eg, it is not 
possible to do gene targeting, gene replacement, obtaining intragenic or 
extragenic suppressors in an efficient way, conclusively prove that a mutationalphenotype is null by constructing deficiency over mutant allele heterozygote, oreven to target a tag to the gene although this latter goal may be achieved 
soon). Thus, a complete sequence will lead to a lot of information that are of 
dubious value because neither straight nor reverse genetics is yet capable of 
telling us what much of these sequences do in terms of function. Second, as a 
result of this blind sequencing project funds will be diverted from more 
interesting, and useful, genetic experiments conducted in smaller labs to largerlaboratories geared towards technology. I am not opposed to sequencing the 
genome per se; but I am opposed to going for a total genome sequencing without 
at least eqivalent amount of funds being directed towards genetic analysis of 
the biological processes. 

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



More information about the Arab-gen mailing list