json - Using json4s-native library in a Spark cluster -
i trying process data in aws emr spark cluster. that, have scala application reads in raw json data s3, parses map[string, any] scala's native scala.util.parsing.json.json library , parsefull method.
then have recursive function, flattens nested json (so map[string, any] not contain maps inside it) , want convert json-formatted string create spark dataframe object.
for parsing map object json string, found this solution mohit.
unfortunately, had problems org.json4s.native library in intellij , said cannot resolve dependency. (in hindsight, know problem of not refreshing project after updating .sbt file correct dependency. in intellij json4s.native library , functions work.)
so @ first, used org.json4s.jackson.json instead. the
json(defaultformats).write(m)
line resulted in string integer numbers converted doubles , not correct.
so got intellij working json4s.native library , result converted numbers correctly.
now, however, having problems using library in spark cluster. if build .jar locally, upload file s3, copy emr cluster , run spark-submit, following error:
exception in thread "main" java.lang.noclassdeffounderror: org/json4s/native/json$
and when try import library spark-shell, response depencendy not resolved.
i tried this suggestion eli leszczynski trying manually place json4s-native jar file cluster, suggested /home/hadoop/lib not work /home/hadoop empty folder when log in hadoop@blabla.compute.amazonaws.com.
i found libraries may found @ /usr/lib or /usr/lib/hadoop/lib folders, copied jar file there , still own script failed.
so, how can use org.json4s.native library in amazon emr spark cluster?
(spark version 1.6 , using scala version 2.10.5 compatible spark version)
Comments
Post a Comment