This analysis aims to observe which features are most helpful in predicting malignant or benign cancer and to see general trends that may aid us in model selection and hyper parameter selection.
Breast cancer is the most common form of cancer in women, and invasive ductal carcinoma (IDC) is the most common form of breast cancer. Accurately identifying and categorizing breast cancer subtypes is an important clinical task, and automated methods can be used to save time and reduce error. The goal of this script is to identify IDC when it is present in otherwise unlabeled histopathology images. The dataset consists of approximately five thousand 50x50 pixel RGB digital images of H&E-stained breast histopathology samples that are labeled as either IDC or non-IDC. These numpy arrays are small patches that were extracted from digital images of breast tissue samples. The breast tissue contains many cells but only some of them are cancerous. Patches that are labeled "1" contain cells that are characteristic of invasive ductal carcinoma. For more information about the data, see https://www.ncbi.nlm.nih.gov/pubmed/27563488 and http://spie.org/Publications/Proceedings/Paper/10.1117/12.2043872.
This project is a part of research on Breast Cancer Diagnosis with Machine Learning algorithm using data-driven approaches. The final outcomes of the research were later published at an IEEE Conference and added to IEEE Xplore Digital Library.
CNN model for classification of breast histology images "benign" or "malignant" using keras and TensorFlow as backend. Attained a classification accuracy of 85%.