I have some questions about the genome sequencing in Arabidopsis T-DNA
We have sequenced some T-DNA (pSKI015) insertion lines of Arabidopsis using
Illumina paired-end sequencing. Using Bowtie2, we aligned all the reads to
the reference genome, which is a combination with TAIR10 genome and T-DNA
insertion sequence of pSKI015. Reads aligned part to T-DNA and part to
Arabidopsis genome were found and the insertion sites were identified
Because we already knew that the phenotype of some mutant lines was not
related to the T-DNA insertion, we also used the bowtie2 results to
identify the SNPs and INDELs in the mutants.
For example, we compared the genome of col-7 to col-0, as background of
these mutants is col-7 and found the SNPs and INDELs in col-7. Then we
compared the genome of mutant1 to col-0, also found the SNPs and INDELs in
mutant1. Finally, SNPs or INDELs, only found in mutant1 not in col-7 were
identified as the true SNPs and INDELs.
Q1: is this method normally used? Or is there a better method to find SNPs
and INDELs in these mutants?
Q2: To our surprise, we have found over 6000 SNPs and INDELs in Chr5
between col-0 and col-7, and over 7000 SNPs and INDELs in Chr5 between
col-0 and mutant1 with value of QUAL >=30. Is it normal to find so much
SNPs and INDELs? We do not expect this high number of SNPs and INDELs as to
our knowledge col-0 and col-7 are very close. The number of SNPs and INDELs
found only in mutant1 but not in col-7 mutant1 is over 1000. T-DNA
insertion mutagenesis of col-7 should not induce lots of mutations. Hence
is it possible that we did something wrong?
Department of Plant Biology
DOE Plant Research Laboratory
Michigan State University