Run Scala Program with Spark on Hadoop -
i have create scala program search word in text file. create file scala eclipse , after compile , create jar sbt , sbt assembly.after run .jar spark in local , run correctly. want try run program using spark on hadoop, have 1 master , 2 work machine. have change code ? , command shell of master? have create bucket , have put text file in hadoop
this code:
import scala.io.source import org.apache.spark.sparkcontext import org.apache.spark.sparkcontext._ import org.apache.spark.sparkconf object wordcount { def main(args: array[string]) { // set spark context val conf = new sparkconf().setappname("wordcount").setmaster("local[*]") val sc = new sparkcontext(conf) val distfile = sc.textfile("bible.txt") print("enter word in holy bile: ") val word = console.readline var count = 0; var finalcount=0; println("you entered " + word) val input = sc.textfile("bible.txt") val splitedlines = input.flatmap(line => line.split(" ")) .filter(x => x.equals(word)) system.out.println("the word " + word + " appear " + splitedlines.count()) } }
thanks all
just change following line,
val conf = new sparkconf().setappname("wordcount").setmaster("local[*]")
to
val conf = new sparkconf().setappname("wordcount")
this allow not modify code whenever want switch local mode cluster mode. master option can passed via spark-submit command follows,
spark-submit --class wordcount --master <master-url> --jars wordcount.jar
and if want run program locally, use following command,
spark-submit --class wordcount --master local[*] --jars wordcount.jar
here list of master option can set while running application.
Comments
Post a Comment