json - Using json4s-native library in a Spark cluster -

- March 15, 2015

i trying process data in aws emr spark cluster. that, have scala application reads in raw json data s3, parses map[string, any] scala's native scala.util.parsing.json.json library , parsefull method.

then have recursive function, flattens nested json (so map[string, any] not contain maps inside it) , want convert json-formatted string create spark dataframe object.

for parsing map object json string, found this solution mohit.

unfortunately, had problems org.json4s.native library in intellij , said cannot resolve dependency. (in hindsight, know problem of not refreshing project after updating .sbt file correct dependency. in intellij json4s.native library , functions work.)

so @ first, used org.json4s.jackson.json instead. the

json(defaultformats).write(m)

line resulted in string integer numbers converted doubles , not correct.

so got intellij working json4s.native library , result converted numbers correctly.

now, however, having problems using library in spark cluster. if build .jar locally, upload file s3, copy emr cluster , run spark-submit, following error:

exception in thread "main" java.lang.noclassdeffounderror: org/json4s/native/json$

and when try import library spark-shell, response depencendy not resolved.

i tried this suggestion eli leszczynski trying manually place json4s-native jar file cluster, suggested /home/hadoop/lib not work /home/hadoop empty folder when log in hadoop@blabla.compute.amazonaws.com.

i found libraries may found @ /usr/lib or /usr/lib/hadoop/lib folders, copied jar file there , still own script failed.

so, how can use org.json4s.native library in amazon emr spark cluster?

(spark version 1.6 , using scala version 2.10.5 compatible spark version)

Search This Blog

To form

json - Using json4s-native library in a Spark cluster -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

ubuntu - Executors lost when starting pyspark in YARN client mode -