Project Summary

Complete Genomic Sequence of Six Shiga-toxigenic E. coli (STEC) Non-O157 Strains

Principle Investigator(s):
James L. Bono, Michael L. Clawson, and Timothy P. L. Smith
U.S. Department of Agriculture, Agricultural Research Service
Completion Date:
May 2008



Over 100 different serotypes of shiga-toxigenic E. coli have been reported to cause disease in humans.  In North America approximately half of human cases are caused by the serotype O157:H7 with the remaining cases being caused by non-O157 serotypes.  The complete genomic sequences of two STEC O157:H7 isolates have provided a wealth of new information about the genomic content of this pathogen, however, little is known about the genomic content of non-O157 STECs.  We proposed to completely sequence the genomes of representative strains from six of the most reported non-O157 serotypes that have caused disease in humans (O26, O111, O103, O121, O45 and O145).  These serotypes account for 70% of the non-O157 STEC human cases.  This approach may allow us to determine genomic variation that is distinct to each non-O157 STEC serotype.  Our genomic comparisons should provide substantial insights regarding the pathogenicity of the different STEC serotypes and may highlight candidate markers for assay development that distinguish STEC O26, O111, O103, O121, O45 and O145 serotypes. 

The stated objectives for this work were to: 

  • Collect complete genomic sequence of six STEC non-O157 strains of the following serotype; O26, O111, O103, O121, O45, and O145.
  • Identify small scale (single nucleotide polymorphism, SNP) and larger scale (gene deletion/duplication) polymorphism distributional differences between strains that can be used for serotype specific assay development.


Shiga toxin-containing E. coli O26, O111, O103, O121, O45, and O145 isolates were chosen from our diverse collection of E. coli serotypes housed at USMARC.  Each isolate was genetically and phenotypically characterized as STEC O26, O111, O103, O121, O45, and O145.  DNA was extracted from each isolate using the Qiagen genomic tip 100/G DNA purification column (Qiagen Inc., Valencia, CA).  Genomic DNA purity and concentration were determined by 260/280 spectrophotometry and agarose gel electrophoresis.  Three to five micrograms of DNA from each isolate was either nebulized or hydrosheared and used to create random and pari-end libraries, respectively, for sequencing using the 454 Genome Sequencer FLX (Roche Applied Science, Indianapolis, IN).  DNA sequence was analyzed using the GS Reference Mapper and GS De Novo Assembler software (Roche Applied Science, Indianapolis, IN) and other DNA analysis software. 


We have completed approximately 20X sequencing coverage for the STEC non-O157 serotypes (low = 17.2 X, high = 23.1 X).  The sequencing coverage for each genome indicates that over 95 % of the genome has been sequenced.  A comparison of polymorphisms from the six genomes that are informative for each serotype will result in serotype specific assays.


The project has resulted in the first reference genomic sequence for each of the serotypes.  These reference genomes will lead to new DNA-based tests specific for pathogenic E. coli (STEC) O26, O111, O103, O121, O45 and O145 serotypes.