- Letter to the Editor
- Open access
- Published:
Machine learning-aided risk stratification in Philadelphia chromosome-positive acute lymphoblastic leukemia
Biomarker Research volume 9, Article number: 13 (2021)
Abstract
We used the eXtreme Gradient Boosting algorithm, an optimized gradient boosting machine learning library, and established a model to predict events in Philadelphia chromosome-positive acute lymphoblastic leukemia using a machine learning-aided method. A model was constructed using a training set (80%) and prediction was tested using a test set (20%). According to the feature importance score, BCR-ABL lineage, polymerase chain reaction value, age, and white blood cell count were identified as important features. These features were also confirmed by the permutation feature importance for the prediction using the test set. Both event-free survival and overall survival were clearly stratified according to risk groups categorized using these features: 80 and 100% in low risk (two or less factors), 42 and 47% in intermediate risk (three factors), and 0 and 10% in high risk (four factors) at 4 years. Machine learning-aided analysis was able to identify clinically useful prognostic factors using data from a relatively small number of patients.
To the Editor
Several prognostic factors for Philadelphia chromosome-positive acute lymphoblastic leukemia (Ph + ALL) have been identified, such as minimal residual disease (MRD) [1, 2], chromosomal abnormalities [3], and genetic lesions [4]. However, further exploration is needed to identify the high-risk group in Ph + ALL The eXtreme Gradient Boosting (XGBoost) algorithm draws attention as an interpretable machine learning model [5], and is considered to be useful for identifying new prognostic factors for Ph + ALL.
XGBoost model
Using a dataset of 59 adult Ph + ALL patients [6], we attempted to identify further risk factors using the XGBoost model [7] (TableS1 and S2). When the trained model was applied to the test set, the mean accuracy was 0.67, and the macro-average precision, recall, and f1-scores were 0.71, 0.78, and 0.66, respectively. The cross-validated accuracy was 0.66 (standard deviation 0.072). The area under the receiver operating characteristic curve (AUC) of the test set was 0.76.
In multivariate analysis using the conventional Cox model, BCR-ABL lineage and age were identified as significant risk factors [6]. According to the feature importance score, two more factors, polymerase chain reaction (PCR) value and white blood cell (WBC) count, were also identified as important features, and the XGBoost decision tree used these four factors as nodes, which suggested these four features were important for the model construction (Fig. 1a and b). There were no strong correlations between the features: the absolute value of the correlation coefficients was between 0.016 (BCR-ABL value and PCR value) and 0.27 (PCR value and WBC count). The mean variance inflation factor for checking multicollinearity between WBC count and another feature was 1.06 (range 1.01–1.09). The permutation feature importance also showed that PCR value, age, and BCR-ABL lineage were important features, which was indicative of how much the prediction using the test set depended on these features (Fig. 1c). The AUC, sensitivity, and specificity were 0.77 [Standard error (SE) 0.06], 0.59, and 0.89 when using parameters identified in the XGBoost model, and 0.72 (SE 0.06), 0.50, and 0.81 when using those identified in the conventional COX model. In the XGBoost model for predicting an event within 2 years from diagnosis, BCR-ABL lineage, PCR value, age, and WBC count were also identified as important features according to the feature importance score (Fig.S1A). The permutation feature importance also identified these four features as important (Fig.S1B).
Survival stratification
Based on the index of dichotomy in the XGBoost decision tree, we considered the following four features as risk factors: uni-lineage Ph leukemia (uni-Ph), a BCR-ABL PCR value≥14500copies/μgRNA, age ≥ 65 years, and WBC count ≥5300/μl. The cohort was divided into three risk groups according to the number of risk factors: low-risk group (Low; two or less factors), intermediate-risk group (Int; three factors), and high-risk group (High; four factors) (TableS3). The event-free survival (EFS) and overall survival (OS) were compared among the three risk groups using conventional statistical techniques (TableS4). The EFS and OS were 80 and 100% in Low, 42 and 47% in Int, and 0 and 10% in High, respectively at 4 years (Fig. 2). The same trend was also confirmed in the stratification using only the test set: EFS at 4 years was 100% in Low, 80% (20–97%) in Int, and 0% in High (P = 0.046).
Discussion
The advantage of extracting risk factors using machine learning is that it can reduce the influence of artificial variable selection that can occur in conventional statistical analyses. In addition, new factors that go unnoticed by humans may be extracted. In this study, the PCR value of BCR-ABL was identified as an important feature. The PCR value of BCR-ABL is considered to be important for following MRD in Ph + ALL [2, 8,9,10,11], so it is not common to consider PCR value at diagnosis as a risk factor in conventional analyses. It is interesting that such a new factor was identified as being useful for prognostic stratification.
In this study, the XGBoost algorithm could extracted clinically valid features using a small dataset comprising 59 cases. Since the small number of cases was one of the major limitations of this study, additional confirmation is required to validate the methodology. Although the difference in predictive indices was small between conventional and machine learning-aided methods, it was suggested that the new parameters could contribute to improving each index.
Availability of data and materials
All data generated or analysed during this study are included in this published article.
Change history
05 March 2021
The original article has been revised to fix URL and ESM errors.
Abbreviations
- AUC:
-
area under the receiver operating characteristic curve
- EFS:
-
event-free survival
- High:
-
high-risk group
- Int:
-
intermediate-risk group
- Low:
-
low-risk group
- MRD:
-
minimal residual disease
- OS:
-
overall survival
- Ph + ALL:
-
Philadelphia chromosome-positive acute lymphoblastic leukemia
- PCR:
-
polymerase chain reaction
- SE:
-
Standard error
- uni-Ph:
-
uni-lineage Ph leukemia
- WBC:
-
white blood cell
- XGBoost:
-
eXtreme Gradient Boosting
References
Yoon JH, Yhim HY, Kwak JY, Ahn JS, Yang DH, Lee JJ, Kim SJ, Kim JS, Park SJ, Choi CW, et al. Minimal residual disease-based effect and long-term outcome of first-line dasatinib combined with chemotherapy for adult Philadelphia chromosome-positive acute lymphoblastic leukemia. Ann Oncol. 2016;27:1081–8.
Nishiwaki S, Imai K, Mizuta S, Kanamori H, Ohashi K, Fukuda T, Onishi Y, Takahashi S, Uchida N, Eto T, et al. Impact of MRD and TKI on allogeneic hematopoietic cell transplantation for Ph+ALL: a study from the adult ALL WG of the JSHCT. Bone Marrow Transplant. 2016;51:43–50.
Short NJ, Kantarjian HM, Sasaki K, Ravandi F, Ko H, Cameron Yin C, Garcia-Manero G, Cortes JE, Garris R, O'Brien SM, et al. Poor outcomes associated with +der (22) t (9,22) and −9/9p in patients with Philadelphia chromosome-positive acute lymphoblastic leukemia receiving chemotherapy plus a tyrosine kinase inhibitor. Am J Hematol. 2017;92:238–43.
Fielding AK. Curing Ph+ ALL: assessing the relative contributions of chemotherapy, TKIs, and allogeneic stem cell transplant. Hematology Am Soc Hematol Educ Program. 2019;2019:24–9.
Yan L, Zhang H-T, Goncalves J, Xiao Y, Wang M, Guo Y, Sun C, Tang X, Jing L, Zhang M, et al. An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence. 2020;2:283–8.
Nishiwaki S, Kim J, Ito M, Maeda M, Okuno Y, Koyama D, Ozawa Y, Gunji M, Osaki M, Kitamura K, et al. Multi-lineage BCR-ABL expression in Philadelphia chromosome-positive acute lymphoblastic leukemia is associated with improved prognosis but no specific molecular features. Front Oncol. 2020;10:586567.
Chen T, Guestrin C: XGBoost: A Scalable Tree Boosting System. arXiv:160302754v3 [csLG] 10 Jun 2016 2016.
Ravandi F, Jorgensen JL, Thomas DA, O'Brien S, Garris R, Faderl S, Huang X, Wen S, Burger JA, Ferrajoli A, et al. Detection of MRD may predict the outcome of patients with Philadelphia chromosome-positive ALL treated with tyrosine kinase inhibitors plus chemotherapy. Blood. 2013;122:1214–21.
Litzow MR. Should anyone with Philadelphia chromosome-positive ALL who is negative for minimal residual disease receive a hematopoietic stem cell transplant in first remission? Best Pract Res Clin Haematol. 2016;29:345–50.
Yoon JH, Min GJ, Park SS, Jeon YW, Lee SE, Cho BS, Eom KS, Kim YJ, Kim HJ, Min CK, et al. Minimal residual disease-based long-term efficacy of reduced-intensity conditioning versus myeloablative conditioning for adult Philadelphia-positive acute lymphoblastic leukemia. Cancer. 2019;125:873–83.
Pfeifer H, Cazzaniga G, van der Velden VHJ, Cayuela JM, Schafer B, Spinelli O, Akiki S, Avigad S, Bendit I, Borg K, et al. Standardisation and consensus guidelines for minimal residual disease assessment in Philadelphia-positive acute lymphoblastic leukemia (Ph + ALL) by real-time quantitative reverse transcriptase PCR of e1a2 BCR-ABL1. Leukemia. 2019;33:1910–22.
Acknowledgements
Not applicable.
Funding
This study was supported in part by JSPS KAKENHI Grant Number JP 20 K08730 and a research grant from The Hori Sciences and Arts Foundation.
Author information
Authors and Affiliations
Contributions
S.N., I.S. and H.K. designed the research and interpreted the data; D.K., Y.O., M.O., and Y.I. collected specimens and provide the data of patients; S.N. performed statistical analyses, and wrote the manuscript. All authors reviewed and approved the final draft.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Each hospital’s institutional review board approved the study. Written informed consent was obtained upon treatment and sample collection.
Consent for publication
Not applicable.
Competing interests
H.K. received research funding from Chugai Pharmaceutical Co., Ltd., Kyowa Hakko Kirin Co., Ltd., Zenyaku Kogyo Co., Ltd., FUJIFILM Corporation, Daiichi Sankyo Co., Ltd., Astellas Pharma Inc., Otsuka Pharmaceutical Co., Ltd., Nippon Shinyaku Co., Ltd., Eisai Co., Ltd., Pfizer Japan Inc., Takeda Pharmaceutical Co., Ltd., Novartis Pharma K.K., Sumitomo Dainippon Pharma Co., Ltd., Sanofi K.K., and Celgene Corporation, consulting fees from Astellas Pharma Inc., Amgen Astellas Bio Pharma K.K., and Daiichi Sankyo Co., Ltd., and honoraria from Bristol-Myers Squibb, Astellas Pharma Inc., and Novartis Pharma K.K. These companies are not directly involved in any part of this study. The remaining authors declare no competing financial interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1:
Fig. S1. Important feature for event within 2 years. (A) The feature importance score. (B) The permutation feature importance.
Additional file 2.
Supplementary methods.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Nishiwaki, S., Sugiura, I., Koyama, D. et al. Machine learning-aided risk stratification in Philadelphia chromosome-positive acute lymphoblastic leukemia. Biomark Res 9, 13 (2021). https://doi.org/10.1186/s40364-021-00268-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40364-021-00268-x