Output file is getting generated on slave machine in apache spark -

- January 15, 2012

i facing issue while running spark java program reads file, manipulation , generates output file @ given path. every thing works fine when master , slaves on same machine .ie: in standalone-cluster mode. problem started when deployed same program in multi machine multi node cluster set up. means master running @ x.x.x.102 , slave running on x.x.x.104. both master -slave have shared ssh keys , reachable each other.

initially slave not able read input file , came know need call sc.addfile() before sc.textfile(). solved issue. see output being generated on slave machine in _temporary folder under output path. ie: /tmp/emi/_temporary/0/task-xxxx/part-00000 in local cluster mode works fine , generates output file in /tmp/emi/part-00000.

i came know need use sparkfiles.get(). not able understand how , use method.

till using

dataframe dataobj = ...  dataobj.javardd().coalesce(1).saveastextfile("file:/tmp/emi");

can 1 please let me know how call sparkfiles.get()?

in short how can tell slave create output file in machine driver running?

please help.

thanks lot in advance.

there nothing unexpected here. each worker writes own part of data separately. using file scheme means data writer file in file system local worker perspective.

regarding sparkfiles not applicable in particular case. sparkfiles can used distribute common files worker machines not deal results.

if reason want perform writes on machine used run driver code you'll have fetch data driver machine first (either collect requires enough memory fit data or tolocaliterator collects partition @ time , requires multiple jobs) , use standard tools write results local file system. in general though writing driver not practice , of time useless.

Search This Blog

To form

Output file is getting generated on slave machine in apache spark -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

ubuntu - Executors lost when starting pyspark in YARN client mode -