International Journal of Pure & Applied Bioscience (IJPAB)
Year : 2018, Volume : 6, Issue : 2
First page : (37) Last page : (46)
Article doi: http://dx.doi.org/10.18782/2320-7051.6308
Mohamed A. Mahfouz*
*PhD, Department of Computer and Systems Engineering, Faculty of Engineering, Alexandria University, Egypt
*Corresponding Author E-mail: m.a.mahfouz@gmail.com
Received: 5.03.2018 | Revised: 10.04.2018 | Accepted: 14.04.2018
ABSTRACT
Clustering is the main step in gene expression analysis. BIRCH algorithm is able to efficiently, incrementally and dynamically cluster data points. However original BIRCH algorithm is limited to the Euclidean distance measure. Euclidian distance is not suitable for gene expression clustering because it is sensitive to scaling and differences in average expression level while correlation is not. This paper proposes an extended BIRCH algorithm (ExBIRCH) based on average Pearson correlation on normalized gene expression dataset. The adaptive possibilistic clustering algorithm is directly applied to the produced sub-clusters represented by their CF vectors. The proposed algorithm inherits the ability of BIRCH to provide a compact model representation. Several clustering algorithms can be applied on leaf nodes of the output tree similar to BIRCH however the proposed possibilistic paradigm has a high rejection to outliers and is able to deal with existing overlapping between clusters. Also, the use of average correlation instead of cluster center, helps discovering non-convex shaped clusters. Experimental study shows that the proposed algorithm is able to generate higher quality clusters in terms of three assessment measures compared to existing algorithms for clustering gene expression data.
Key words: Bioinformatics, Gene expression analysis, Hierarchical Clustering, PossibilisticFull Text : PDF; Journal doi : http://dx.doi.org/10.18782
Cite this article: Mahfouz, M.A., ExBIRCH: Scalable Non-Centroid BIRCH-like Algorithm for Clustering Gene Expression Data based on Average Correlation, Int. J. Pure App. Biosci. 6(2): 37-46 (2018). doi: http://dx.doi.org/10.18782/2320-7051.6308