how to fit and score a machine learning models in Java/JVM based application -


could please guide me on how create , execute machine learning models/statistical models (regression, decision tree, k means clustering, naive bayes, scorecard/linear/logistic regression etc. , gbm, glm ) in java/jvm based application (in production).

we have etl sort of java based product 1 can of data preparation steps machine learning, data ingestion jdbc, files, hdfs, no sql etc., joins , aggregations etc.(which required feature engineering) , want add analytics capabilities using machine learning/statistical modeling.

right now, using jpmml- evaluator score models created in pmml format using r , python (and knime) needs 3 separate , unconnected steps:- 1- first step data preparation in our java/jvm application , save sampling data (training , test) data in csv file or in db, - 2- create machine learning model in r , python (and knime) , export in pmml 4.2 format - 3- import/deploy pmml in our java based application , use jpmml evaluator execute in production.

i sure it's common problem in machine learning in production java preferred on python or r. suggest better approach(s) create execute python/scikit based machine learning model in jvm based application.

what thought achieve steps # 2 , #3 more seamlessly in jvm based application, without compromising performance , usability:-

1- call java program internally calls python scikit script (under hood) create model in pmml , use jpmml evaluator. pretend user in single jvm based application (better usability). not sure limitations , short coming of using pmml not features supported in jpmml-sklearn. 2- call java program internally calls python script , model creation execution in external python environment , serialized model , results in file/csv or in memory db (or cache, hazelcast) parent java application fetch results etc.. researched can’t use jython executing sci-kit models. 3- can use jep (embed python in java) embed cpython in jvm ? tried sci-kit models?

alternatively, should explore use mahout or weka - java based machine learning libraries in jvm based application. (i need support both windows , non-windows platforms)

i exploring h2oai java based. tried it.

if have etl hdfs backend, suggest deploying spark on cluster , using spark's mlib machine learning algorithms. support methods mentioned above.

do mind giving context size (rows, columns, type) of data plan work with? java not recommended goto-language ml scala compiles jvm bytecode , has similar syntax java (in addition having java api).

if you're producing proof-of-concept, java fine if you're planning on working big data, doesn't scale well.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -

c++ - Migration from QScriptEngine to QJSEngine -