From cavanaug from ncbi.nlm.nih.gov Tue Dec 18 18:39:40 2007 From: cavanaug from ncbi.nlm.nih.gov (Cavanaugh, Mark (NIH/NLM/NCBI) [E]) Date: Tue Dec 18 18:39:49 2007 Subject: [Genbank-bb] GenBank 163.0 Close-of-Data Message-ID: <7F40ACD22B0A23448C4E8755E5875FE70F00465A@NIHCESMLBX8.nih.gov> Greetings GenBank Users, Close-of-data for the upcoming GenBank Release 163.0 occurred on Monday December 11 at approximately 1:30am EST. The subsequently generated GenBank Incremental Update files nc1217.aso, nc1217.flat, etc. contain data through the close. Note: Release processing often does not begin until sometime during business hours on the close date. As a result, a number of sequence records processed *after* 1:30am are likely to be present in the GenBank 163.0 release files, even though they are "post-close" . Similarly, the first GenBank Incremental Update that is generated after the close date is likely to contain a number of sequence records that are unchanged, compared to their appearance in the release files. Our apologies for the lack of advanced notice about the close date. Mark Cavanaugh GenBank NCBI/NLM/NIH/HHS From cavanaug from ncbi.nlm.nih.gov Tue Dec 18 18:42:12 2007 From: cavanaug from ncbi.nlm.nih.gov (Cavanaugh, Mark (NIH/NLM/NCBI) [E]) Date: Tue Dec 18 18:42:20 2007 Subject: [Genbank-bb] GenBank 163.0 Close-of-Data In-Reply-To: <7F40ACD22B0A23448C4E8755E5875FE70F00465A@NIHCESMLBX8.nih.gov> References: <7F40ACD22B0A23448C4E8755E5875FE70F00465A@NIHCESMLBX8.nih.gov> Message-ID: <7F40ACD22B0A23448C4E8755E5875FE70F00465B@NIHCESMLBX8.nih.gov> > From: Cavanaugh, Mark (NIH/NLM/NCBI) [E] > Sent: Tuesday, December 18, 2007 6:40 PM > To: genbankb@magpie.bio.indiana.edu > Subject: [Genbank-bb] GenBank 163.0 Close-of-Data > > Greetings GenBank Users, > > Close-of-data for the upcoming GenBank Release 163.0 occurred > on Monday December 11 at approximately 1:30am EST. ^^^^^^^^^^^^^^^^^^ Correction: Monday December 17 > The subsequently generated GenBank Incremental Update files > nc1217.aso, nc1217.flat, etc. contain data through the close. [snip] Mark Cavanaugh GenBank NCBI/NLM/NIH/HHS From cavanaug from ncbi.nlm.nih.gov Sat Dec 22 16:56:45 2007 From: cavanaug from ncbi.nlm.nih.gov (Mark Cavanaugh) Date: Sat Dec 22 16:57:09 2007 Subject: [Genbank-bb] GenBank Release 163.0 Now Aailable Message-ID: <200712222156.lBMLujRP017463@hyperion.ncbi.nlm.nih.gov> Greetings GenBank Users, GenBank Release 163.0 is now available via FTP from the National Center for Biotechnology Information (NCBI): Ftp Site Directory Contents ---------------- --------- --------------------------------------- ftp.ncbi.nih.gov genbank GenBank Release 163.0 flatfiles ncbi-asn1 ASN.1 data used to create Release 163.0 Close-of-data for GenBank 163.0 occured on 12/17/2007. Uncompressed, the Release 163.0 flatfiles require roughly 314 GB (sequence files only) or 335 GB (including the 'short directory', 'index' and the *.txt files). The ASN.1 data require approximately 289 GB. Recent statistics for non-WGS, non-CON sequences: Release Date Base Pairs Entries 162 Oct 2007 81563399765 77632813 163 Dec 2007 83874179730 80388382 Recent statistics for WGS sequences: Release Date Base Pairs Entries 162 Oct 2007 102003045298 25354041 163 Dec 2007 106505691578 26177471 During the 59 days between the close dates for GenBank Releases 162.0 and 163.0, the non-WGS/non-CON portion of GenBank grew by 2,310,779,965 basepairs and by 2,755,569 sequence records. During that same period, 655,660 records were updated. An average of about 57,800 non-WGS records were added and/or updated per day. Between releases 162.0 and 163.0, the WGS portion of GenBank grew by 4,502,646,280 basepairs and 823,430 sequence records. For additional release information, see the README files in either of the directories mentioned above, and the release notes (gbrel.txt) in the genbank directory. Sections 1.3 and 1.4 of the release notes (Changes in Release 163.0 and Upcoming Changes) have been appended below. ** Important Note ** GenBank 'index' files are now provided without any EST content, and without most GSS content. See Section 1.3.9 of the release notes for further details. NCBI is considering ceasing support for the index files, so we encourage affected users to review that section and provide feedback. Release 163.0 data, and subsequent updates, are available now via NCBI's Entrez and Blast services. As a general guideline, we suggest first transferring the GenBank release notes (gbrel.txt) whenever a release is being obtained. Check to make sure that the date and release number in the header of the release notes are current (eg: December 15 2007, 163.0). If they are not, interrupt the remaining transfers and then request assistance from the NCBI Service Desk. A comprehensive check of the headers of all release files after your transfers are complete is also suggested. Here's how one might go about this on a Unix platform, using csh/tcsh : set files = `ls gb*.*` foreach i ($files) head -10 $i | grep Release end Or, if the files are compressed, perhaps: gzcat $i | head -10 | grep Release If you encounter problems while ftp'ing or uncompressing Release 163.0, please send email outlining your difficulties to: info@ncbi.nlm.nih.gov Mark Cavanaugh, Vladimir Alekseyev, Michael Kimelman GenBank NCBI/NLM/NIH/HHS