scala - Sample Spark Program -


hi i'm learning spark , scala have 1 scenario need come sparkscala code

input file

name  attr1 attr2 attr3   john    y     n    n   smith   n     y    n 

expected output

john  attr1 y   john  attr2 n   john  attr3 n   smith attr1 n   ...   ... 

i know how in map-reduce

for each line name sepearately , iterate through attr values , emmit output (name, attrx y/n) in scala , spark bit confusing, can 1 me?

assume know number of input attributes, , input attributes separated \t, this:

in java

// load data file javardd<string> file = jsc.textfile(path);  // build header rdd javardd<string> header = jsc.parallelize(arrays.aslist(file.first()));  // subtract header have real data javardd<string> data = file.subtract(header);  // create row rdd javardd<row> rowrdd = data.flatmap(new flatmapfunction<string,row>(){     private static final long serialversionuid = 1l;      @override     public iterable<row> call(string line) throws exception {         string[] strs = line.split("\t");         row r1 = rowfactory.create(strs[0], "attr1", strs[1]);         row r2 = rowfactory.create(strs[0], "attr2", strs[2]);         row r3 = rowfactory.create(strs[0], "attr3", strs[3]);         return arrays.aslist(r1,r2,r3);     } });  // schema df structtype schema = new structtype().add("name", datatypes.stringtype)                                     .add("attr", datatypes.stringtype)                                     .add("value", datatypes.stringtype);  dataframe df = sqlcontext.createdataframe(rowrdd, schema); df.show(); 

here output:

+-----+-----+-----+ | name| attr|value| +-----+-----+-----+ |smith|attr1|    n| |smith|attr2|    y| |smith|attr3|    n| | john|attr1|    y| | john|attr2|    n| | john|attr3|    n| +-----+-----+-----+ 

scala , java similar, translate scala.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -