-
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
- Back
Metadata
Document Title
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
Author
Disratthakit A.,Toyo-oka L.,Thawong P.,Paiboonsiri P.,Wichukjinda N.,Ajawatanawong P.,Thipkrua N.,Suthum K.,Palittapongarnpim P.,Tokunaga K.,Mahasirimongkol S.
Name from Authors Collection
Affiliations
Department of Medical Sciences, Ministry of Public Health, Nonthaburi, 11000, Thailand; Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-8654, Japan; Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand; The Office of Diseases Prevention and Control 5, Department of Diseases Control, Ministry of Public Health, Ratchaburi, 70000, Thailand; National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Ministry of Science and Technology, Pathum Thani, 12120, Thailand; Genome Medical Science Project (Toyama), National Center for Global Health and Medicine, Tokyo, 162-8655, Japan
Type
Article
Source Title
Infection, Genetics and Evolution
ISSN
15671348
Year
2020
Volume
79
Open Access
All Open Access, Bronze
Publisher
Elsevier B.V.
DOI
10.1016/j.meegid.2019.104152
Abstract
Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences. © 2019 Elsevier B.V.
Industrial Classification
Knowledge Taxonomy Level 1
Knowledge Taxonomy Level 2
Knowledge Taxonomy Level 3
License
CC BY or a CC BY-NC-ND
Rights
Author
Publication Source
Scopus