- Open Access
A genome wide association study on Newfoundland colorectal cancer patients’ survival outcomes
Biomarker Researchvolume 3, Article number: 6 (2015)
In this study we performed genome-wide association studies to identify candidate SNPs that may predict the risk of disease outcome in colorectal cancer.
Patient cohort consisted of 505 unrelated patients with Caucasian ancestry. Germline DNA samples were genotyped using the Illumina® human Omni-1quad SNP chip. Associations of SNPs with overall and disease free survivals were examined primarily for 431 patients with microsatellite instability-low (MSI-L) or stable (MSS) colorectal tumors using Cox proportional hazards method adjusting for clinical covariates. Bootstrap method was applied for internal validation of results. As exploratory analyses, association analyses for the colon (n = 334) and rectal (n = 171) cancer patients were also performed.
As a result, there was no SNP that reached the genomewide significance levels (p < 5x10−8) in any of the analyses. A small number of genetic markers (n = 10) showed nominal associations (p <10−6) for MSS/MSI-L, colon, or rectal cancer patient groups. These markers were located in two non-coding RNA genes or intergenic regions and none were amino acid substituting polymorphisms. Bootstrap analysis for the MSS/MSI-L cohort data suggested the robustness of the observed nominal associations.
Likely due to small number of patients, our study did not identify an acceptable level of association of SNPs with outcome in MSS/MSI-L, colon, or rectal cancer patients. A number of SNPs with sub-optimal p-values were, however, identified; these loci may be promising and examined in other larger-sized patient cohorts.
One of the current interests of medicine is to first identify and then integrate new prognostic markers in prediction models that can help distinguish cancer patients with different risks of disease outcome after diagnosis. Genetic sequence variations, such as single nucleotide polymorphisms (SNPs), may have biological roles in modifying the outcome risk and are the focus of the many current prognostic studies. Among many genetic approaches applied, the genomewide SNP survival association studies are considered useful as they examine a large number of genetic markers scattered along the DNA covering the majority of the genomic regions. Several studies applied this approach to identify genetic markers with prognostic associations in cancer such as pancreatic , esophageal , breast , and lung  cancers.
Colorectal cancer is a common malignancy . Mortality rates of this disease have been decreasing in many countries including Asian , European , and North American countries (Canada  and the USA ). Yet even with this improvement in patient survival, the 5-year survival rate for this disease is estimated to be 62-64% in North America [10,11]. In other parts of the world, such as in India and Eastern European countries, these rates are lower . In Canada, one of the highest incidence rates of colorectal cancer is observed in the Newfoundland population . This population is also characterized by a high incidence of familial colorectal cancer  and by one of the lowest cancer survival rates in Canada .
The disease stage remains as the most important indicator of prognosis of colorectal cancer patients. Research reported in the literature suggests that other factors may also modify the prognosis, such as age at diagnosis, tumor location , vascular/lymphatic invasion , and molecular features such as the microsatellite instability (MSI) status [17,18].
MSI-high (MSI-H) is observed in almost 15% of the colon cancers and is characterized by inactivation of DNA mismatch repair genes either by germ line mutations in MLH1, MSH2, MSH6, and PMS2 genes (inherited colon cancer syndrome known as the Lynch syndrome; ) or by the somatic promoter hypermethylation of the MLH1 gene (sporadic colorectal cancer with MSI-H ). Patients with the MSI-H tumors have better survival rates than the patients with MSI-low (MSI-L) or microsatellite stable (MSS) tumors and the chromosomal instability (CIN) + tumors [17,18]. In addition, MSI-H tumor phenotype is rarely observed in rectal cancers  and colon and rectal cancers also differ from each other in terms of other molecular alterations, risk of recurrence, and treatment approaches [15,22,23].
Identification of genetic predictors of disease outcomes in cancer patients is a promising research aim. Till now, studies aiming to test the potential of polymorphisms as prognostic markers in colorectal cancer have been restricted to the candidate gene or pathway approaches. In this study, for the first time we performed a genomewide survival association study for patients with MSS or MSI-L tumors (MSS/MSI-L; n = 431). In addition, due to the differences between the colon and rectal cancers, as an exploratory analysis we investigated the prognostic associations of the same genetic markers in the rectal (n = 171) and colon cancer (n = 334) patients separately.
The baseline demographic, clinical and pathological data for the patients in the MSS/MSI-L, colon, and rectal cancer patient groups are shown in Table 1. The MSS/MSI-L patient group was the largest (n = 431), followed by colon (n = 334) and rectal (n = 171) patient groups. The number of events for overall survival (i.e. death) was 158, 105 and 65 in MSS/MSI-L, colon and rectal groups and for disease free survival (i.e. recurrence, metastasis or death) was 184, 121 and 79 in the same groups, respectively. For the entire cohort (n = 505), the 5-year and 10-year survival probabilities were 79.0% and 46.4% for OS and the 5-year and 10-year disease-free probabilities were 68.5% and 49.1%, respectively.
The Quantile-Quantile (Q-Q) plots that were drawn for each sub-group and each outcome are shown in the Additional file 1. These plots show that the models on the genome-wide scan satisfy the expected distribution and suggest the appropriateness of the multivariate model settings.
As a result of the statistical analysis, possibly due to the limited sample size, no association with genome-wide association significance levels (p < 5.0E-08) was detected in models constructed. But, ten SNPs showed suggestive associations with OS and DFS times (p < 1.0E-06), which is the nominal significance cut-off level in our study (Table 2). Manhattan plots for the patient cohorts and outcomes are shown in Additional file 1. For the interested readers, the list of associations detected at a p-value less than 1.0E-05 can be found in Additional file 2.
In the MSS/MSI-L patient group association study, there were two SNPs in the DFS model that achieved the nominal-significance level (Table 2). The HRs obtained as a result of the bootstrap method demonstrated the robustness of the results from the original association analysis (Additional file 3). One of these SNPs (rs6720296) was a frequent variant (MAF: 40%) and based on the information in the dbSNP database , was located in a non-coding RNA gene (long intergenic non-protein coding RNA 1121; LINC01121). The second marker (rs1407508) was a relatively infrequent SNP (MAF: 5.8%) located in a non-coding region on chromosome 9.
In the colon patient group analysis, two SNPs showed nominally-significant signals in OS or DFS models (Table 2). These SNPs were located in non-coding regions along the chromosome 20 and 14.
Among all groups, the rectal cancer patient group was the one with the largest group of SNPs identified, the majority of which were intergenic. In this patient cohort, two SNPs in OS model and four SNPs in DFS model had p-value lower than the cut-off level (p < 1.0x10−6) (Table 2). Interestingly, one intergenic SNP was associated with both OS and DFS in this patient group (rs6854845). Among the six SNPs, rs17057166 was the only one located in a gene (non-coding RNA; AC011343.1).
None of the SNPs in Table 2 were amino acid changing SNPs (non-synonymous). As of February 2015, there was no publication in PUBMED about these SNPs. A search at the Regulome DB database  returned information for five of the polymorphisms (rs17026425, rs1407508, rs17280262, rs17057166, and rs6854845). According to these results, rs17026425 (Regulome DB score: 3a) is located within a transcription factor (such as JUND) binding site, yet it is “relatively less likely to affect protein binding”. For rs1407508 (Regulome DB score: 4), rs17280262, rs17057166, and rs6854845 (Regulome DB scores: 5), Regulome DB suggests that there is minimal or no evidence of proteins binding to the DNA sequences where these SNPs are located.
An additional information related to the genes for the SNPs with p-values less than 1.0x10−5 is shown in the Additional file 4.
In this study, we aimed to investigate the associations of a large number of SNPs (n = 729,737) with overall or disease free survival times in Caucasian colorectal cancer patients from Newfoundland, Canada. Our primary aim was to investigate the associations of SNPs with the disease outcome in the patients with MSI-L or MSS tumors (n = 431) as the outcome risks for this group of patients and the patients with the MSI-H tumors are significantly different from each other [17,18]. In addition, due to the differences between the colon and rectal cancer patients (for example the relatively decreased recurrence risk as well as the higher number of MSI-H tumors in colon cancer patients compared to the rectal cancer patients [15,21]), as an exploratory analysis, separate analyses for the colon (n = 334) and rectal (n = 171) cancer patients were also performed. Of note, due to the small number of the patients with MSI-H tumors (n = 53), we have not attempted the statistical analyses in this group.
The main result of this study is that none of the genetic markers investigated reached the genomewide significance levels (5.0E-08) in either overall or disease free survival analyses in any of the patient groups investigated. Thus, we were not able to identify a genetic marker with a strong association with the risk of main clinical outcomes (i.e. death, local or distant metastases) in colorectal cancer. This can be interpreted as that none of the genetic markers investigated are related to the survival outcomes of interest. Alternatively, this can be also due to the fact that the sample sizes of the patient cohorts investigated in this study were small and thus our study power was limited to detect possible associations (see Methods). Considering this study power issue, in this manuscript we present and discuss the SNPs with p-values (<1.0E-06) higher than the genome-wide significance levels as potentially promising genetic markers (Table 2). We suggest that these markers may be promising and should be investigated in larger-sized cohorts to test whether they are associated with the colorectal cancer prognosis. Further research may also be performed to test whether the two non-coding RNA genes (AC011343.1 and LINC01121) shown in Table 2 have prognostic roles in colorectal cancer. In addition, while there is currently no literature report about the SNPs in Table 2 showing their biological functions or relation to health and disease, Regulome DB  data suggest that one of the SNPs, rs17026425, is located within the binding site of JUN/JUND transcription factors. Mammalian JUN family of transcriptional regulators includes c-JUN, JUNB, and JUND with important roles in cell proliferation and carcinogenesis (reviewed in ). According to our results, rs17026425 polymorphism was nominally associated with overall survival in rectal cancer sub-group (Table 2). Further studies on rs17026425 polymorphism may test its potential binding to the JUN family of transcription factors, its potential role in variable JUN function, and contribution to the rectal cancer formation and progression.
In conclusion, we performed genomewide SNP survival association studies in MSS/MSI-L, colon and rectal cancer patients. A limitation of this study is that all three patient cohorts investigated were characterized by small samples sizes, thus we cannot confidently conclude whether the investigated genetic markers indeed have no prognostic associations in colorectal cancer. However, this study also generated a small set of SNPs that may be an interest for other investigators in their future analyses.
Patients registered at the Newfoundland Colorectal Cancer Registry (NFCCR) were investigated in the present study. Characteristics of the NFCCR patient cohort were described earlier [13,27]. Briefly, between 1999–2003, a total of 750 participants were recruited to NFCCR. Informed consent was obtained from the patients or their family members. Recurrence, metastasis or vital status information for patients were collected till 2010 as described in Negandhi et al. . Tumor characteristics (i.e. MSI status) were determined as explained by Woods et al. . Among 750 patients, 736 stage I-IV patients had the prognostic data collected during the follow-up period.
Genotyping and quality control (QC)
Germline DNA was isolated from patients’ blood samples. Initially, a total of 539 patients with available prognostic data and germline DNA were subject to whole-genome SNP genotyping using the Illumina® Omni1-Quad human SNP genotyping platform (service provider: Centrillion Biosciences, USA).
QC analyses on the genetic data were performed using PLINK v1.07  and Eigensoft 4.2 . We excluded 1) one subject because of mismatching sex information; 2) 129,172 SNPs with high missing genotype data (above 5%); 3) 21 individuals who were first, second, or third degree relatives (based on the identity by state PI_HAT score >0.125), and 4) one subject with extreme value of heterozygosity (out of 6 standard deviations). Additionally, Multidimensional-scaling (MDS) method was used to identify individuals with diversity in ancestry. Comparing with the Hapmap III points in the MDS plot, six outliers that are far away from Caucasian cluster were removed from the study (Additional file 5). Lastly, 275,285 SNPs and 320 SNPs were excluded because of failure in HWE test (p < 1.0E-08) and low minor allele frequency (MAF < 0.05), respectively.
Principal component analysis was undertaken with the EIGENSTRAT package after QC filtering (Additional file 5). The first two principal components were incorporated into the models. Additional analyses for population stratification were undertaken with each of the genetic marker adjusting the estimated principal components. As a result, five population outliers were identified and excluded from the analysis.
After these analyses, a total of 505 subjects and 729,737 SNPs were included in the final analysis. The adequacy of the probability distribution and possibility of differential genotyping were formally evaluated using the Q-Q plots of –log10(p-values) (Additional file 1).
We examined the associations of SNPs with two major clinical outcomes: overall survival (OS) and disease free survival (DFS). We define OS as the date from diagnosis till the date of death from any cause or the date of last follow-up, and DFS as the date from diagnosis till the date of local recurrence, metastasis or death from any cause, or the date of last follow-up. OS or DFS status was available for 504 patients.
Cox proportional hazard models were applied based on a genetic additive model. Hazard Ratio (HR) estimates and corresponding 95% confidence intervals (CIs) were calculated using R statistical programming language . The association between genetic markers and survival outcomes, OS and DFS, adjusting for the top two principal components and other confounding factors (gender and stage for MSI-L or MSS sub-group, gender, stage and MSI for colon and rectal sub-groups), was tested using Cox proportional hazard model.
Clinical factors, as potential confounding factors in the genetic analysis, were evaluated firstly using univariate analysis. Among all the clinical factors: gender, stage, MSI were found to be significantly associated with OS (p-values: 0.0151, 2.2E-16, 5.2E-05, respectively); and gender, site (colon/rectum), stage, MSI were shown to be significantly associated with DFS (p-values: 0.0139, 0.0344, 3.3E-16, 1.2E-04, respectively). These clinical variables were used to build up baseline multivariable models for OS and DFS separately in each sub-cohort.
In order to present the genetic association with the outcome variable in more detail, MSS/MSI-L patient, colon cancer patients and rectal cancer patient groups were analysed separately using Cox proportional hazards model. Since two outcome measures (overall and disease free survivals) were examined in MSS/MSI-L, colon and rectum cancer patient cohorts, a total of six models were constructed. While for the genome-wide screening the genetic markers were modeled using additive genetic models, dominant, recessive, and co-dominant genetic models were also applied as sensitivity analysis to assess the genetic inheritance effect. For each scenario, two types of models were explored, namely crude, and comprehensive models: for the crude model, we adjusted by principal components only; for the comprehensive model, we adjusted by principal components and the baseline model factors. In this report we show the comprehensive models.
In order to validate the results obtained from the MSS/MSI-L cohort association study, we applied bootstrap resample method. Based on the original data set, 200 bootstrap samples data sets were generated using sampling with replacement algorithm. For each of these bootstrap data sets, we applied Cox proportional hazards regression model on the SNPs with significance levels p < 1.0x10−5 identified from the original analysis and calculated the corresponding HRs of the genetic effect. Overall, 200 HRs were provided for each SNP and bootstrap confidence intervals were created.
We computed the study power for the MSS-MSI-L cohort (n = 431) using PASS 13 (NCSS, LLC. Kaysville, Utah, USA). Assuming a SNP in linkage disequilibrium, LD, (D’ = 1) with a risk allele frequency 0.3, we have 0.76 power to detect nominal significant association at p < 1.0x10−6 under a dominant model with strong effect size of HR 2.0 in the MSS-MSI-L cohort. To detect an association with the same assumptions and at p < 5E-08 significance level, the statistical power is reduced to 0.56. With a moderate effect size of HR 1.5, the power to detect genome wide significant association (p < 5E-08) is very low (0.08).
Minor allele frequency
Newfoundland Colorectal Cancer Registry
Principal component analysis
Single nucleotide polymorphism
Wu C, Kraft P, Stolzenberg-Solomon R, Steplowski E, Brotzman M, Xu M, et al. Genome-wide association study of survival in patients with pancreatic adenocarcinoma. Gut. 2014;63(1):152–60.
Wu C, Li D, Jia W, Hu Z, Zhou Y, Yu D, et al. Genome-wide association study identifies common variants in SLC39A6 associated with length of survival in esophageal squamous-cell carcinoma. Nat Genet. 2013;45(6):632–8.
Azzato EM, Pharoah PD, Harrington P, Easton DF, Greenberg D, Caporaso NE, et al. A genome-wide association study of prognosis in breast cancer. Cancer Epidemiol Biomarkers Prev. 2010;19(4):1140–3.
Huang YT, Heist RS, Chirieac LR, Lin X, Skaug V, Zienolddiny S, et al. Genome-wide analysis of survival in early-stage non-small-cell lung cancer. J Clin Oncol. 2009;27(16):2660–7.
Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010;127(12):2893–917.
Shin A, Jung KW, Won YJ. Colorectal cancer mortality in Hong Kong of China, Japan, South Korea, and Singapore. World J Gastroenterol. 2013;19(7):979–83.
Malvezzi M, Bertuccio P, Levi F, La Vecchia C, Negri E. European cancer mortality predictions for the year 2014. Ann Oncol. 2014;25(8):1650–6.
Kachuri L, De P, Ellison LF, Semenciw R, Advisory Committee on Canadian Cancer Statistics. Cancer incidence, mortality and survival trends in Canada, 1970–2007. Chronic Dis Inj Can. 2013;33(2):69–80.
Stewart SL, King JB, Thompson TD, Friedman C, Wingo PA. Cancer mortality surveillance-United States, 1990–2000. MMWR Surveill Summ. 2004;53(3):1–108.
Marrett LD, De P, Airia P, Dryer D, Steering Committee of Canadian Cancer Statistics 2008. Cancer in Canada in 2008. CMAJ. 2008;179(11):1163–70.
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T, et al. Cancer statistics, 2008. CA Cancer J Clin. 2008;58(2):71–96.
Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. CA Cancer J Clin. 2005;55(2):74–108.
Green RC, Green JS, Buehler SK, Robb JD, Daftary D, Gallinger S, et al. Very high incidence of familial colorectal cancer in Newfoundland: a comparison with Ontario and 13 other population-based studies. Fam Cancer. 2007;6(1):53–62.
Ugnat AM, Xie L, Semenciw R, Waters C, Mao Y. Survival patterns for the top four cancers in Canada: the effects of age, region and period. Eur J Cancer Prev. 2005;14(2):91–100.
Li FY, Lai MD. Colorectal cancer, one entity or three. J Zhejiang Univ Sci B. 2009;10(3):219–29.
Lim SB, Yu CS, Jang SJ, Kim TW, Kim JH, Kim JC. Prognostic significance of lymphovascular invasion in sporadic colorectal cancer. Dis Colon Rectum. 2010;53(4):377–84.
Popat S, Hubner R, Houlston RS. Systematic review of microsatellite instability and colorectal cancer prognosis. J Clin Oncol. 2005;23(3):609–18.
Walther A, Houlston R, Tomlinson I. Association between chromosomal instability and prognosis in colorectal cancer: A meta-analysis. Gut. 2008;57(7):941–50.
Lucci-Cordisco E, Zito I, Gensini F, Genuardi M. Hereditary nonpolyposis colorectal cancer and related conditions. Am J Med Genet A. 2003;122A(4):325–34.
Veigl ML, Kasturi L, Olechnowicz J, Ma AH, Lutterbaugh JD, Periyasamy S, et al. Biallelic inactivation of hMLH1 by epigenetic gene silencing, a novel mechanism causing human MSI cancers. Proc Natl Acad Sci U S A. 1998;95(15):8698–702.
Nilbert M, Planck M, Fernebro E, Borg A, Johnson A. Microsatellite instability is rare in rectal carcinomas and signifies hereditary cancer. Eur J Cancer. 1999;35(6):942–5.
Zlobec I, Bihl MP, Schwarb H, Terracciano L, Lugli A. Clinicopathological and protein characterization of BRAF- and K-RAS-mutated colorectal cancer and implications for prognosis. Int J Cancer. 2010;127(2):367–80.
Fransen K, Klintenas M, Osterstrom A, Dimberg J, Monstein HJ, Soderkvist P. Mutation analysis of the BRAF, ARAF and RAF-1 genes in human colorectal adenocarcinomas. Carcinogenesis. 2004;25(4):527–33.
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29(1):308–11.
Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7.
Mechta-Grigoriou F, Gerald D, Yaniv M. The mammalian Jun proteins: redundancy and specificity. Oncogene. 2001;20(19):2378–89.
Woods MO, Younghusband HB, Parfrey PS, Gallinger S, McLaughlin J, Dicks E, et al. The genetic basis of colorectal cancer in a population-based incident cohort with a high rate of familial disease. Gut. 2010;59(10):1369–77.
Negandhi AA, Hyde A, Dicks E, Pollett W, Younghusband BH, Parfrey P, et al. MTHFR Glu429Ala and ERCC5 His46His polymorphisms are associated with prognosis in colorectal cancer patients: analysis of two independent cohorts from Newfoundland. PLoS One. 2013;8(4):e61469.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904–9.
Core Team R: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing 2013, Vienna, Austria :http://www.R-project.org.
This study would not be possible without the patients participating in NFCCR. We thank Ms. Andrea Kavanagh for her help with the prognostic and clinical database and Ms. Michelle Simms with her help with the DNA samples. Authors also thank other NFCCR personnel and investigators who contributed to the registry over the many years.
The authors sincerely thank the following funding agencies: Medical Research Fund-Cox Award-2010 (to RG and SS) by the Faculty of Medicine, Memorial University; Research and Development Corporation of Newfoundland (RDC; leverage fund to WX, RG, PP, SS: contract number: 5404.1201.102), Canadian Institute of Health Research (CIHR; RPP-operating fund to WX, RG, PP, SS; FRN: 110045), CIHR fund for the Colorectal Cancer Interdisciplinary Health Research Team at the University of Toronto and Memorial University, the National Cancer Institute of Canada (grants 18223 and 18226) and the Atlantic Innovation Fund for the Interdisciplinary Research Team in Human Genetics. The funding sources had no involvement in the study design; in the collection, analysis or interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.
The authors declare that they have no competing interests.
WX led the statistical analysis of the data, interpreted the study results, and drafted the manuscript, JX performed the statistical analyses and helped draft the manuscript, KS helped perform the QC measures on the genetic data, ED helped collect the clinicopathological and prognostic data of the study cohort, JC helped collect the patients and determine familial risk status, PP led the collection of patients and patient data including the genomewide SNP genotype data, RG helped collection of patients and patient data including the genomewide SNP genotype data, SS conceived and led the study, helped obtain the genomewide SNP genotype data, helped interpret the study results, drafted and submitted the manuscript. All authors approved the final form of the manuscript.
Q-Q and Manhattan plots. a) OS analysis in MSS/MSI-L patient sub-cohort (Inflation factor: 1.024), b) DFS analysis in MSS/MSI-L patient sub-cohort (Inflation factor: 0.971), c) OS analysis in colon cancer patient sub-cohort (Inflation factor: 1.015), d) DFS analysis in colon cancer patient sub-cohort (Inflation factor: 0.958), e) OS analysis in rectal cancer patient sub-cohort (Inflation factor: 1.042), f) DFS analysis in rectal cancer patient sub-cohort (Inflation factor: 1.005).
SNPs identified from six models with significance values of p < 10 -5 .
Hazard ratios obtained by the bootstrap method for the MSS/MSI-L patient cohort.
Information on the genic locations of the genetic polymorphisms with p < 1.0x10 -5 .
Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) analysis results for the patient cohort.