scala - Sample Spark Program -
hi i'm learning spark , scala have 1 scenario need come sparkscala code
input file
name attr1 attr2 attr3 john y n n smith n y n
expected output
john attr1 y john attr2 n john attr3 n smith attr1 n ... ...
i know how in map-reduce
for each line name sepearately , iterate through attr values , emmit output (name, attrx y/n)
in scala , spark bit confusing, can 1 me?
assume know number of input attributes, , input attributes separated \t
, this:
in java
// load data file javardd<string> file = jsc.textfile(path); // build header rdd javardd<string> header = jsc.parallelize(arrays.aslist(file.first())); // subtract header have real data javardd<string> data = file.subtract(header); // create row rdd javardd<row> rowrdd = data.flatmap(new flatmapfunction<string,row>(){ private static final long serialversionuid = 1l; @override public iterable<row> call(string line) throws exception { string[] strs = line.split("\t"); row r1 = rowfactory.create(strs[0], "attr1", strs[1]); row r2 = rowfactory.create(strs[0], "attr2", strs[2]); row r3 = rowfactory.create(strs[0], "attr3", strs[3]); return arrays.aslist(r1,r2,r3); } }); // schema df structtype schema = new structtype().add("name", datatypes.stringtype) .add("attr", datatypes.stringtype) .add("value", datatypes.stringtype); dataframe df = sqlcontext.createdataframe(rowrdd, schema); df.show();
here output:
+-----+-----+-----+ | name| attr|value| +-----+-----+-----+ |smith|attr1| n| |smith|attr2| y| |smith|attr3| n| | john|attr1| y| | john|attr2| n| | john|attr3| n| +-----+-----+-----+
scala , java similar, translate scala.
Comments
Post a Comment