German outbreak strains: Alignment-free whole-genome phylogeny suggested that EAEC 55989 is a close relative but not a direct parent
Introduction:
Accuracy of SNP-based whole-genome phylogeny reconstruction relies heavily on quality of sequence alignment which is particularly hindered by poorly assembled genomes. Alignment-free methods might provide additional insights. Here, we constructed a whole-genome phylogeny of 9 outbreak isolates against existing E. coli genomes using the alignment-free feature frequency profile (FFP) method (Sims et al. 2009).
Datasets:
1. Genome sequences of 30 E. coli isolates from NCBI, and
2. Genome sequences of 9 outbreak isolates, TY2482 from BGI, LB226692 from Life Technologies, 5 isolates from HPA, and 2 isolates from Göttingen.
Method:
1. Genome sequences were converted into an RY (purine/pyrimidine)-coded form.
2. Overlapping features, l-mers of length 24, were counted over each of the whole genomes.
3. Forward and reverse complement features were considered equivalent.
4. Only core features which were present in all 39 isolates were extracted.
5. Features occurring more than 3 times in any of the isolates were removed.
6. Simple cumulative distances were calculated among all feature states in an unordered manner.
7. A neighbor-joining tree was plotted with the resulting distance matrix using MEGA5.
For details, please refer to Sims & Kim (2011).
Results:
The following is the phylogenetic tree produced using the current alignment-free method. Outbreak isolates were highlighted in red.
Our tree generally agrees with that reported by Konrad Paszkiewicz & Kat Holt built using a SNP-based approach, both revealing a high similarity among the outbreak isolates and 55989 being the most closely related isolate sequenced thus far. However, we should note that the genetic difference between 55989 and the outbreak isolates in our FFP tree is greater than that in the tree built based on SNPs.
Linear phylogenetic inference of bacteria might be problematic due to horizontally transferred elements. While Kat Holt tried to remove obviously horizontally transferred DNA by removing SNP calls in genes annotated with keywords like "phage" and "transposase", we removed features likely to be associated with mobile or repetitive DNA by filtering out features with high frequencies and used an unordered character state model to reduce the effect of lateral transfer signal. Both methods seem to work well.
Concluding remarks:
To sum up, using an alignment-free phylogenetic approach, we further confirm 55989 being the most closely related isolate to the outbreak isolates sequenced thus far, again showing an EAEC-origin of the outbreak isolates. However, the genetic difference suggested that 55989 is not a direct parent of the outbreak isolates.
To sum up, using an alignment-free phylogenetic approach, we further confirm 55989 being the most closely related isolate to the outbreak isolates sequenced thus far, again showing an EAEC-origin of the outbreak isolates. However, the genetic difference suggested that 55989 is not a direct parent of the outbreak isolates.
References:
Sims GE, Jun S-R, Wu GA, Kim S-H (2009) Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. PNAS 106:2677-2682.
Sims GE, Kim S-H (2011) Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). PNAS 108:8329-8334.
Prepared by Simon CHEUNG, Wenyan NONG, Lei LI & H.S. KWAN
No comments:
Post a Comment