Yilin Wu Eric Zander Andrew Ardeleanu Ryan Singleton Barnabas Bede


Molecular marker-based glioblastoma (GBM) subclassification is emerging as a key factor in personalized GBM treatment planning. Multiple genetic alterations, including methylation status and mutations, have been proposed in GBM subclassification. RNA-Sequence (RNA-Seq)-based molecular profiling of GBM is widely implemented and readily quantifiable. Machine learning (ML) algorithms have been reported as an applicable method that can consistently subgroup GBM. In this study, we systematically studied the applicability of the commonly used ML algorithms based on The Cancer Genome Atlas Glioblastoma Multiforme (TCGA-GBM) dataset and cross-validated in the Chinese Glioma Genome Atlas (CGGA) dataset.  ML algorithms studied include Binomial and multinomial Logistic Regression, Linear discriminant analysis, Decision trees, K-Nearest Neighbors, Gaussian Naive Bayes, Support Vector Machines, Gradient Boosting, Voting Ensemble, Multi-Layer Perceptron.  RNA-Seq data of 44 biomarkers were passed through the algorithms for performance evaluation. We found ML algorithms Support Vector Machines, Multi-Layer Perceptron s, and Voting Ensemble are best equipped in assigning GBM to correct molecular subgroups of GBM without histological studies.