Monday, November 30, 2015

Spark test

  1. spark + mariaDB test
    1. SPARK_CLASSPATH=mysql-connector-java-5.1.34-bin.jar bin/pyspark
    2. df = sqlContext.load(source="jdbc", url="jdbc:mysql://10.0.2.a/test?user=root&password=yourpassword", dbtable="r_input_day")
    3. df.first()
  2. spark + elasticsearch test
    1. SPARK_CLASSPATH=/data/elasticsearch-hadoop-2.1.0.Beta4/dist/elasticsearch-hadoop-2.1.0.Beta4.jar ./bin/pyspark
    2. conf = {"es.resource":"sec-team/access-report"}
    3. rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat", "org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)
    4. rdd.first()
  3. spark streaming test
    1. network_wordcount.py
      1. from __future__ import print_function
        import sys
        from pyspark import SparkContext
        from pyspark.streaming import StreamingContext
        if __name__ == "__main__":
            if len(sys.argv) != 3:
                print("Usage: network_wordcount.py <hostname> <port>", file=sys.stderr)
                exit(-1)
            sc = SparkContext(appName="PythonStreamingNetworkWordCount")
            ssc = StreamingContext(sc, 1)
            lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
            counts = lines.flatMap(lambda line: line.split(" "))\
                          .map(lambda word: (word, 1))\
                          .reduceByKey(lambda a, b: a+b)
            counts.pprint()
            ssc.start()
            ssc.awaitTermination()
    2. nc -lk 9999
    3. spark-submit network_wordcount.py localhost 9999

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.