r - How to run parallel Elbow method to find appropriate k-clusters -
the "data.clustering" data frame size: 943x2
> head(data.clustering) age gender 2 2 1 3 6 2 4 2 1 5 2 1 6 6 2 7 6 1
when found k values using elbow method:
elbow.k <- function(mydata){ ## determine "good" k using elbow dist.obj <- dist(mydata); hclust.obj <- hclust(dist.obj); css.obj <- css.hclust(dist.obj,hclust.obj); elbow.obj <- elbow.batch(css.obj); # print(elbow.obj) k <- elbow.obj$k return(k) } # find k value start.time <- sys.time(); k.clusters <- elbow.k(data.clustering); end.time <- sys.time(); cat('time find k using elbow method is',(end.time - start.time),'seconds k value:', k.clusters); time large: time find k using elbow method 24.01472 seconds k value: 10
can me use parallel in r reduce time of elbow method ? lot.
you can use library(parallel) package in r. must consider import variable , package enviroment using clusterevalq(), clusterexport(). think code below: library(parallel)
#
elbow.k <- function(mydata){ ## determine "good" k using elbow dist.obj <- dist(mydata); hclust.obj <- hclust(dist.obj); css.obj <- css.hclust(dist.obj,hclust.obj); elbow.obj <- elbow.batch(css.obj); # print(elbow.obj) k <- elbow.obj$k return(k) } # find k value no_cores <- detectcores(); cl<-makecluster(no_cores); clusterevalq(cl, library(gmd)); clusterexport(cl, list("clustering.kmeans")); // add variables , functions enviroment start.time <- sys.time(); k.clusters <- parsapply(cl, 1, function(x) elbow.k(data.clustering)); // or parlapply - returns list. end.time <- sys.time(); cat('time find k using elbow method is',(end.time - start.time),'seconds k value:', k.clusters); stopcluster(cl);
Comments
Post a Comment