Computational framework for early detection of breast cancer

Al Yousef, A.

Publication

Computational framework for early detection of breast cancer

Authors

Al Yousef, A.

Date

2013

Type

Thesis

Collections

Doctoral (PhD) Theses
Centre for Advanced Computational Solutions

Abstract

Breast Cancer is the second leading cause of death after lung cancer in women all over the world whose lives could be saved by an early detection. This could be achieved by improving the diagnostic accuracy of the present Computer Aided Diagnosis systems (CAD) for breast cancer, which use both clinical and biological data. As a means of achieving this goal, the thesis focussed on examining and evaluating both clinical and biological data used in the present Breast Cancer CAD systems. Results were then applied for early detection of breast cancer in women from a low income country, Jordan, where breast cancer incidence (32%), ranks among the highest in the world. In the first part of the study, the clinical part, we identified a new mass feature related to mass shape, called Central Regularity Degree (CRD) from Ultrasound images, which was then used along with five other powerful mass features: one geometric feature: Depth-Width ratio (DW); two morphological features: shape and margin; blood flow and age, in the classification with four different classifiers: Artificial Neural Networks (ANN), K Nearest Neighbour (KNN), Nearest Centroid (NC) and Linear Discriminant Analysis (LDA). ANN gave the best performance with an improved accuracy of classification, from 81.8% to 95.5% after adding CRD. The overall improvement of the diagnostic accuracy of the CAD, after adding CRD was 14%, which was a significant improvement. The second focus of the study was centred on biological data. The aim was to enhance the diagnostic accuracy of CADs that use gene expression profiling of peripheral blood cells, by introducing a novel feature selection method called Bi-biological filter and Best First Search with SVM (BFS-SVM). The bi-biological filter contained two biological filters; the first one to find the shared biomarkers between two cancer subsets and the second one to eliminate the healthy biomarkers from the shared ones. The study evaluated the diagnostic accuracy of three classifiers; Artificial Neural Network (ANN), SVM and Linear Discriminant Analysis (LDA)with 5-fold out cross validation. The study used 121 samples – 67 malignant and 54 benign cases as input for the system. The Bi-biological filter selected 415 genes as mRNA biomarkers and BFS-SVM was able to select 13 out of 415 genes for classification of breast cancer. ANN was found to be the superior classifier with 93.2% classification accuracy which was a 14% improvement over the original study (Aaroe et al. 2010). The third focus of the study was on female patients in Jordan, a low income country with a high rank in breast cancer incidence. We used Bi-Biological filter and BFS-SVM wrapper to analyse 56 blood serum samples to detect circulating breast cancer miRNA biomarkers in Jordanian women and use them to improve the diagnostic accuracy of circulating miRNA based breast cancer CADs. The Bi-biological filter selected 74 miRNAs as breast cancer biomarkers. And 7 out of 74 were selected by BFS-SVM for breast cancer classification. SVM was found to be the superior classifier with 98.2% classification accuracy which was a 12% improvement compared with Schrauder et al. (2012) and %7 compared with Hu et al. (2012).