Project Summary

Genetic Variation between Shiga-toxigenic E. coli (STEC) O157 Strains from Human Clinical and Beef Origin

Principle Investigator(s):
Michael L. Clawson, James L. Bono, and Tommy L. Wheeler
U.S. Department of Agriculture, Agricultural Research Service
Completion Date:
May 2008



STEC O157 remains a perennial and costly food safety issue and public health concern to the beef industry at the production, processing, wholesale, distribution, retail and consumer levels. While STEC O157 strains were originally described by microbiologists as “clonal” (i.e., genetically and phenotypically invariant), an ever-growing body of emerging scientific and epidemiologic evidence suggests that this is probably not the case. Rather, there appears to be significant, frequent and informative (i.e., discriminatory) variation between and among STEC O157 field isolates in both strain phenotypic properties and in fine-resolution genomic structure. These strain differences may be manifested by the diverse ecological niches occupied, by variability in time and dose required to initiate and maintain gastro-intestinal tract infections in livestock, by capacity to survive on and to adhere (or not) to animal hides or muscle tissue (meat), and by the variable competence of strains to infect and cause disease (or not) in people exposed to STEC O157. 

A growing body of recent scientific evidence suggests that human clinical STEC O157 strains represent only a genetic and phenotypic subset of the STEC O157 populations founds in animals and the ambient environment. Unfortunately, comprehensive STEC O157 characterization, especially genetic and genomic characterization, has focused almost exclusively on human clinical isolates. For example, STEC O157:H7 human clinical strains EDL 933 (from a 1982 U.S. hamburger index outbreak) and Sakai (from the 1996 Sakai City, Japan radish sprout outbreak) are the only two STEC O157 strains that have been completely DNA sequenced to date. New, relatively low-cost contract DNA sequencing systems have become available in recent years. One such system, developed by NimbleGen Systems, Inc, (Madison, WI) called Comparative Genome Sequencing (CGS) rapidly surveys entire microbial genomes (compared to a completely sequenced reference strain) to identify the locations of single nucleotide polymorphisms (SNPs).   

The stated objectives for this work were: 

  • To define optimal and judicious selection criteria for, and then assemble, a diverse panel of 10 to 20 isolates from beef and beef cattle environments (e.g., feces, carcass, hide, feedlot soil origin isolates) 
  • To utilize Comparative Genome Sequencing (CGS) to survey the entire microbial genome of at least 10 STEC O157 beef diversity panel strains (compared to Sakai strain of STEC O157 as the reference sequence) and then identify DNA polymorphisms in each strain. 
  • To combine and integrate the new DNA polymorphism data generated from the bovine diversity panel with the existing corresponding data from human STEC O157 isolates (Zhang et al, 2006) to develop a generation human-animal DNA polymorphism-based STEC O157 typing system. 

E. coli O157:H7 isolates for the isolate panels were chosen from our collection at USMARC.  Each isolate was genetically and phenotypically characterized as STEC O157:H7.  DNA was extracted from each isolate using the Qiagen genomic tip 100/G (Qiagen Inc., Valencia, CA).  Genomic DNA purity and concentration were determined by 260/280 spectrophotometry and agarose gel electrophoresis.  Equivalent amounts of DNA from bovine or human isolates were used to generate DNA pools of STEC O157:H7 of bovine and human origin, respectively.  Three to five micrograms of DNA from the DNA pools was nebulized and used to create a library for sequencing using the 454 Genome Sequencer FLX (Roche Applied Science, Indianapolis, IN).  DNA sequence was analyzed using the GS Reference Mapper and GS De Novo Assembler software (Roche Applied Science, Indianapolis, IN) and other DNA analysis software.  SNPs are being scored with a system that uses primer-oligonucleotide base extension (PROBE), nano-liter dispensing of extension products onto silicon chips (Sequenom, Inc.) and fully automated mass spectrometric analysis using a MALDI-TOF MS (Bruker-Sequenom, Inc.) in fully automated mode.   


We have identified 13,026 putative SNPs through 1.35X sequencing coverage of 99 bovine STEC O157:H7 isolates.  While most of these SNPs are real and contain genotyping informativity, some may turn out to be artifacts that were generated through the bioinformatics software used for this project.  We do not need additional funds for this project, however, project completion requires 1) independent verification of SNPs from the bovine STEC O157:H7 isolates, 2) sequence completion of our human clinical STEC O157:H7 isolates 3) assay verification of any novel SNPs discovered in the human clinical STEC O157:H7 isolates, and 4) selection of a minimal set of SNPs that contain maximal information for fingerprinting STEC O157:H7 of human and/or bovine origin by genotype.   

While the remaining work is underway and results are forthcoming, we clearly have a high probability for success in achieving the goals of this project.  STEC O157:H7 was once considered virtually clonal.  Contrary to this notion, we have found a DNA polymorphism for every 434 bases of the genome in STEC O157:H7.  This level of genetic variation strongly works in our favor in terms of developing a genotyping platform for distinguishing individual strains of STEC O157:H7.    

It is also clear that our high-risk decision to switch methods in SNP detection during this study has paid off.  We were able to add over ten times the original number of STEC O157:H7 strains originally proposed because we switched from the CGS method to 454 GS FLX pyrosequencing.  This huge boost in samples has given great power to the study.  Additionally, our DNA pooling approach, which was not recommended by the 454 GS FLX specialists as being too risky, worked spectacularly well and has greatly enabled our ability to detect SNPs and determine allele frequencies.  Consequently, we are moving into the final stages of this project extremely well positioned to achieve our objectives.  


The scientists identified thousands of differences between the DNA sequences of 190 E. coli STEC O157:H7 isolates.  These genetic differences, also known as single nucleotide polymorphisms (SNPs), may be instrumental in either replacing or complementing current typing methods for E. coli STEC O157:H7.  The scientists are currently developing an SNP-based typing method to 1) fingerprint the genetic diversity of E. coli STEC O157:H7 2) characterize strains associated with human outbreaks and 3) detect and trace E. coli STEC O157:H7 isolates.