Monday, December 4, 2017

Spark and Jupyter

  • Download
  • Install
    • chmod +x Anaconda2-5.0.1-Linux-x86_64.sh
    • sudo bash Anaconda2-5.0.1-Linux-x86_64.sh
  • Secure
    • source .bashrc
    • jupyter notebook --generate-config
    • jupyter notebook password
  • IP
    • vim .jupyter/jupyter_notebook_config.py
    • #c.NotebookApp.ip = 'localhost' → c.NotebookApp.ip = '*'
  • PySpark
    • vim .bashr
    • Add the following contents
      PYSPARK_PYTHON=/usr/bin/python
      PYSPARK_DRIVER_PYTHON=/usr/bin/python
      SPARK_HOME=/usr/hdp/current/spark2-client/
      PATH=$PATH:/usr/hdp/current/spark2-client/bin
      PYSPARK_DRIVER_PYTHON=jupyter
      PYSPARK_DRIVER_PYTHON_OPTS=notebook
    • sudo su -
    • mkdir -p /usr/local/share/jupyter/kernels/pyspark
    • chown -R test_user:test_user jupyter/
    • exit
    • vim /usr/local/share/jupyter/kernels/pyspark/kernel.json
    • Add the following contents
      {
       "display_name""PySpark",
       "language""python",
       "argv": [ "/home/test_user/anaconda2/bin/python""-m""ipykernel",
       "-f""{connection_file}" ],
       "env": {
       "SPARK_HOME""/usr/hdp/current/spark2-client/",
       "PYSPARK_PYTHON":"/home/test_user/anaconda2/bin/python",
       "PYTHONPATH""/usr/hdp/current/spark2-client/python/:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip",
       "PYTHONSTARTUP""/usr/hdp/current/spark2-client/python/pyspark/shell.py",
       "PYSPARK_SUBMIT_ARGS""--master yarn pyspark-shell"
       }
      }
  • Run
    • jupyter notebook