java - Understanding Apache Spark filter transformation behavior -


i have list of items in javardd, every item date (java calendar). now, want filter dates less given date. that's code:

main

    public static void main(string[] args) {         sparkconf conf = new sparkconf().setappname("date comparison test")             .setmaster("local[4]").set("spark.executor.memory", "1g");         javasparkcontext sc = new javasparkcontext(conf);          // initializes filter date 01/01/2016 @ 10:00:00         calendar filterdate = calendar.getinstace();         filterdate.clear();         filterdate.settimeinmillis(1451642400000l);          // initializes array of 40 calendars, in every date         // 1 hour later previous, starting         // 01/01/2016 @ 08:00:00         arraylist<calendar> calendararray = new arraylist<>();         // milliseconds corresponding 01/01/2016 @ 08:00:00         long initial = 1451635200000l;         for(int i=0; < 40; ++i) {             calendar 1 = calendar.getinstace();             one.clear();             one.settimeinmillis(initial);             calendararray.add(one);             initial += 3600000;         }          javardd<calendar> rdd = sc.parallelize(calendararray);         javardd<calendar> rddfiltered = rdd.filter(new filtertest(filterdate));         system.out.println("rdd size " + rddfiltered.count());         sc.close(); } 

filtertest code

public class filtertest implements function<calendar, boolean> {  private static final long serialversionuid = -3134317182912968444l; private final calendar filteringdate;  public filtertest_(calendar filteringdate) {     super();     this.filteringdate = filteringdate; }  @override public boolean call(calendar arg0) throws exception {     // getstandardformatteddate prints date in given format     system.out.println(timeutils.getstandardformatteddate(arg0) + " - " + timeutils.getstandardformatteddate(filteringdate));     if(arg0.before(filteringdate)) {         return false;     }     else {          return true;         }   } } 

what can't understand output get. seems fixed calendar pass parameter compare changes (like when it's sat, 01 jan 2016 22:00:00).

output

sat, 01 jan 2016 08:00:00 - fri, 01 jan 2016 10:00:00 sat, 01 jan 2016 08:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 08:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 15:00:00 - fri, 01 jan 2016 09:00:00 fri, 01 jan 2016 09:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 15:00:00 - fri, 01 jan 2016 09:00:00 fri, 01 jan 2016 10:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 20:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 02 jan 2016 07:00:00 sat, 02 jan 2016 17:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 11:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 21:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - sat, 01 jan 2016 22:00:00 fri, 01 jan 2016 22:00:00 - fri, 01 jan 2016 10:00:00 sat, 01 jan 2016 22:00:00 - fri, 01 jan 2016 12:00:00 fri, 01 jan 2016 23:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 23:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 23:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 02 jan 2016 00:00:00 sat, 02 jan 2016 00:00:00 - fri, 01 jan 2016 19:00:00 fri, 01 jan 2016 13:00:00 - fri, 02 jan 2016 10:00:00 sat, 01 jan 2016 10:00:00 - sat, 01 jan 2016 14:00:00 sat, 02 jan 2016 01:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 02 jan 2016 11:00:00 sat, 01 jan 2016 14:00:00 - fri, 02 jan 2016 11:00:00 sat, 02 jan 2016 11:00:00 - fri, 01 jan 2016 15:00:00 fri, 01 jan 2016 15:00:00 - sat, 02 jan 2016 10:00:00 sat, 02 jan 2016 02:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 15:00:00 - sat, 02 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 10:00:00 - fri, 01 jan 2016 10:00:00 fri, 01 jan 2016 12:00:00 - fri, 01 jan 2016 22:00:00 sat, 01 jan 2016 17:00:00 - fri, 02 jan 2016 10:00:00 fri, 01 jan 2016 22:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 13:00:00 - fri, 01 jan 2016 10:00:00 sat, 01 jan 2016 10:00:00 - fri, 01 jan 2016 10:00:00 sat, 02 jan 2016 23:00:00 - fri, 01 jan 2016 10:00:00 

what happening during computation distribution variable? because apparently result correct, i'm having trouble in debugging code in more complex situation.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -

c++ - Migration from QScriptEngine to QJSEngine -