pySpark с интеграцией conda выдает ошибку pyspark не распознается - PullRequest
0 голосов
/ 17 февраля 2020

Шаги: Установлено Java, Python, Spark, Anaconda и настройка пути в каждом. Но pyspark в командной строке не связывает Jupyter с ноутбуком.

Получение следующей ошибки:

"pyspark" не распознается как внутренняя или внешняя команда, работающая программа или пакетный файл. "

1 Ответ

1 голос
/ 17 февраля 2020
    Follow these steps:
    Install JAVA
    1.Download Python
    Python 3.x
    [https://www.python.org/downloads/][1]


    2.Set Path
    As we have select the "set path" option we don’t have o set the path manually.
    3.Verify Python Install or not
    a)
    Cmd>python -V
    b)
    Open Python terminal by writing "python" command in the terminal-IDLEs

    InStall spark
   Verify PySpark Installed or not:-
   ===================================================
   Cmd>pyspark

   It will open pyspark shell i.e python shell i.e IDLEs
   IDLEs is an interactive shell to write python applications

   First Pyspark Application:-
   ===================================================
   We can write PySpark Application in 2 modes. They are:
   1.Interactive --Pysaprk Shell
   2.Batch Application---IDEs --Integrated Development Environments
                    (Jupyter Notebook,Pycharms,etc)

   How to develop first pyspark appliction in interactive mode??
   ===================================================
   e.g Load local file and count no.of rows and print data

   Cmd>pyspark
   --> it will open pyspark ahell
   -->It is created sparkContext with variable name "sc"
   -->SparkContext is a predefined class,it is required to write Spark Application
   >>>sc  
   <SparkContext master=local[*] appName=PySparkShell>

    ANACONDA Installation:
    ============================================
    Jupyter Notebook installation

    1.Download Anaconda
    https://www.anaconda.com/distribution/

    2.Install Anaconda
    By double click .exe file choose all default options
    3.set Path Variable (This is optional when se;ect add path environment at the time of 
    installation)
    4.Start Anaconda and Open Jupyter
    Configuring PySpark with Jupyter Notebook:-
    ============================================
    1.Python or Anaconda software must be installed(Jupiter Notebook)
    2.PySpark must be installed.
    How to open Pyspark:
    ==================
    Cmd>pyspark
    How PySpark to start Jupyter Notebook:
    ==========================
    We can start Jupyter notebook in two ways. They are:

    1.Start Anaconda Navigater--->Launch Jupyter Notebook
    2.Open command prompt and type
   Cmd>jupyter notebook
Here we write Python Application    
Set Environmental Variable:-
=========================
PYSPARK_DRIVER_PYTHON=jupyter
PYSPAR_DRIVER_PYTHON_OPTS=notebook   
  [1]: https://www.python.org/downloads/
...