Defining Growth Conditions

Les Klimczak klimczak at xyrisco.com
Wed Nov 10 13:38:22 EST 1999


On Wed, 10 Nov 1999 10:00:11   J-Antoni Rafalski wrote:
>I am a bit concerned that the approach taken seems to create more and more
>narrow categories that would be relevant to only very few users and will be
>empty in most data base records.
	This would be the case if the DB were kept in a flat file. In a modern
relational DB, such as Oracle, you can relegate less important
attributes to peripheral tables, which are filled only when data are
entered, so that there are no "empty records".
	In the simplest case, there can be a special table for rarely used
values. Such table could be joined to retrieve such values by an example
query:

SELECT ... WHERE ... and experiment.id = categories.experiment_id and
categories.name = 'moon phase' and categories.value = 'full moon';

	Of course this is just an example, and more complex "object-type"
atributes could also be modeled with a little effort.
	This would still make lots of potentially useful information available
for capturing and querying for those who may find it useful, even if
they may represent a relatively small segment of the community :-).
	However, I agree that the central part of the DB schema should be very
simple and excessive "objectification" a la ACeDB should be avoided.

>Moreover, there will always be categories
>and conditions that remain undefined. I would favor defining the most
>common and fundamental categories (species, tissue, etc) as indexed fields,
>and then relegate the minor categories to a text field called
>"description", or whatever. This is the appraoach we (=Stan Luck) are
>taking.
	Text fields are a really bad idea. They would make the values stored in
them poorly defined and would significantly degrade the quality of the
data (see Genbank's "description", "notes", and other text monsters) and
the ability to access anything in them computationally.
	Please remember that with the vast amounts of data in an expression
profiling DB, it would not make much sense to "browse" text fields, so
the computational way of querying well-defined attributes is the only
way to go.
	In this context, I would like to stress the importance of nomenclature
standards and appropriate constraints enforcing them that would disallow
to enter ad hoc descriptors and/or wrong data types into the DB and
again degrade data quality as we see it in Genbank.

	Les Klimczak

Akkadix Databases and Data Mining

>Antoni Rafalski
>DuPont AgBiotech Genomics
>
>
>David Finkelstein <finkel at genome.stanford.edu> on 11/09/99 12:58:07 PM
>
>To:   arabidopsis newsgroup <arab-gen at net.bio.net>
>cc:   plantarrays at fafner.Stanford.EDU (bcc: J-Antoni Rafalski/AE/DuPont)
>Subject:  Defining Growth Conditions
>

--
---------------------------------------------------------------------
Les Klimczak, Dr. rer. nat.  	| Tel.:  (858) 646-8241
Senior Scientist             	| FAX:   (858) 625-0158
Akkadix Corporation, S.160   	| mailto:klimczak at akkadix.com
11099 North Torrey Pines Road	| http://www.akkadix.com/lkl.html
La Jolla, CA 92037           	|
---------------------------------------------------------------------





More information about the Arab-gen mailing list