Item

Validating a gene expression signature of invasive ductal carcinoma of the breast and detecting key genes using neural networks

Samarasinghe, Sandhya
Kulasiri, Don
Date
2009-07
Type
Conference Contribution - published
Fields of Research
Abstract
Breast cancer is one of the leading causes of death in women in the world. It is a complex disease with challenges to accurate diagnosis due to cancer subtypes that are difficult to distinguish. The most common subtype is Invasive Ductal Carcinoma (IDC), a cancer in ductal cells that line the milk ducts in the breast. In depth understanding of the genetic basis of IDC can help treat it more effectively. Microarray based gene expression analysis is making new grounds in accurate diagnosis of diseases including cancer. Microarray experiments are designed to measure the expression levels of thousands of genes in various cells/tissues of interest and they are analysed to decipher a small set of genes that constitutes the gene signature of a particular disease. The few studies on breast cancer gene expression compare cancer subtypes but very few have compared gene expression between matched cancer and healthy tissues in the breast (Turashvili et al., 2007). The few studies that have compared different subtypes have little agreement on the gene signatures (Turashvili, 2007; Zhao et al., 2004, Sorlie, et al., 2001). Therefore, it is highly beneficial to further assess the validity of genes identified as differentially expressed, in order to boost confidence in the usefulness of the genes in various medical applications including diagnosis, prognosis and drug development. In this study, the validity of differentially expressed genes pertaining to a carefully conducted experiment on breast tissues affected by Invasive Ductal Carcinoma (IDL) and matched healthy tissues is conducted using neural networks and statistical methods. The data was obtained from NCBI database and deposited by Turashvili et al (2007) from their experiments on breast cancer. The original authors extracted a 326 gene signature for IDC using statistical methods. In our study, the ability of this gene set to discriminate the disease state from healthy state is investigated and validated using two sets of independent datasets. Our visual and qualitative exploration using Self organizing maps (SOM) followed by statistical tests indicated that the validation data supported 80% of the original gene signature. Another SOM results declared that the original gene set is able to classify patients as being healthy or having IDC. Original gene set was optimally clustered into two classes based on correlation of expression patterns of genes by SOM /Ward clustering. The two classes and genes in them were supported by 60% of the validation data. As an alternative, PCA was used to determine genes with correlated expressions in the original gene signature and 4 PCs accounted for 86% of the variation in the data with the first 2 PCs accounting for around 70%. Top most important 100 genes in PC1 and PC2 provided 52% support for the two SOM classes with PC1 dominating class 1 and PC2, class 2. Genes that were validated by independent data in the two SOM classes were used in conjunction with PC1 and PC2 to extract highly influential genes from the top 6%, 18% and 57% of the original genes represented by PC1 and PC2. These key genes may prove to be the most crucial in identifying ductal tumor from healthy tissues. Four new genes were among key genes that may shed more light onto the disease mechanism. The key genes as well as overall set of validated genes may provide further support to understand or refine genetic networks that these genes are part of in the next stage of our study.
Source DOI
Rights
Copyright © The Authors. The responsibility for the contents of this paper rests upon the authors and not on the Modelling and Simulation Society of Australia and New Zealand Inc.
Creative Commons Rights
Access Rights