Bioinformatics Pipeline for Human Papillomavirus Short Read Genomic Sequences Classification Using Support Vector Machine
Citations Over Time
Abstract
We recently developed a test based on the Agilent SureSelect target enrichment system capturing genomic fragments from 191 human papillomaviruses (HPV) types for Illumina sequencing. This enriched whole genome sequencing (eWGS) assay provides an approach to identify all HPV types in a sample. Here we present a machine learning algorithm that calls HPV types based on the eWGS output. The algorithm based on the support vector machine (SVM) technique was trained on eWGS data from 122 control samples with known HPV types. The new algorithm demonstrated good performance in HPV type detection for designed samples with 25 or greater HPV plasmid copies per sample. We compared the results of HPV typing made by the new algorithm for 261 residual epidemiologic samples with the results of the typing delivered by the standard HPV Linear Array (LA). The agreement between methods (97.4%) was substantial (kappa= 0.783). However, the new algorithm identified additionally 428 instances of HPV types not detectable by the LA assay by design. Overall, we have demonstrated that the bioinformatics pipeline is an accurate tool for calling HPV types by analyzing data generated by eWGS processing of DNA fragments extracted from control and epidemiological samples.
Related Papers
- → Typing of Viruses by Combinations of Antiserum Pools. Application to Typing of Enteroviruses (Coxsackie and Echo)(1960)203 cited
- → Classification using support vector machines with graded resolution(2005)31 cited
- → Observations on the Relationship Between Key Strike Force and Typing Speed(1996)10 cited
- → The Association between Computer Typing Style and Typing Speeds(2007)4 cited
- 경수 6∼7번 완전손상 장애인을 위한 타이핑 보조도구의 유형이 컴퓨터 입력장치에 미치는 영향(2002)