Запуск Jupyter Notebook на AWS Lambda - PullRequest
0 голосов
/ 07 августа 2020

Я пытаюсь запустить Jupyter Notebook на AWS Lambda, создал слой со всеми зависимостями, jupyter notebook - это простой код, который извлекает файл csv из Amazon S3 и отображает данные в виде гистограммы. Ниже приведена лямбда-функция, написанная для загрузки файла .ipynb и выполнения записной книжки с помощью papermill. Не уверен, почему он не работает с модулем boto3, который не найден.

import json
import sys
import os
import boto3
# papermill to execute notebook
import papermill as pm
import pandas as pd
import logging
import matplotlib.pyplot as plt

sys.path.append("/opt/bin")
sys.path.append("/opt/python")
os.environ["PYTHONPATH"]='/var/task'
os.environ["PYTHONPATH"]='/opt/python/'
os.environ["MPLCONFIGDIR"] = '/tmp/'
# ipython needs a writeable directory
os.environ["IPYTHONDIR"]='/tmp/ipythondir'
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    s3.meta.client.download_file('test-boto', 'testing.ipynb', '/tmp/test.ipynb')
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
    s3_client.upload_file('/tmp/juptest_output.ipynb', 'test-boto', 'temp/juptest_output.ipynb')
    logger.info(event)

Ошибка o / p:

START RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06 Version: $LATEST
[INFO]  2020-08-07T17:55:16.602Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Input Notebook:  /tmp/test.ipynb
[INFO]  2020-08-07T17:55:16.603Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Output Notebook: /tmp/juptest_output.ipynb

Executing:   0%|          | 0/15 [00:00<?, ?cell/s][INFO]   2020-08-07T17:55:17.311Z    c4da3406-c829-4f99-9fbf-b231a0d3dc06    Executing notebook with kernel: python3
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

Executing:   7%|▋         | 1/15 [00:01<00:14,  1.06s/cell]
Executing:   7%|▋         | 1/15 [00:01<00:20,  1.46s/cell]
[ERROR] PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [1]":
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-9c332490c231> in <module>
      1 import pandas as pd
      2 import os
----> 3 import boto3
      4 import matplotlib.pyplot as plt
      5 client = boto3.client('s3')

ModuleNotFoundError: No module named 'boto3'

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 28, in lambda_handler
    pm.execute_notebook('/tmp/test.ipynb', '/tmp/juptest_output.ipynb', kernel_name='python3')
  File "/opt/python/papermill/execute.py", line 110, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/opt/python/papermill/execute.py", line 222, in raise_for_execution_errors
    raise errorEND RequestId: c4da3406-c829-4f99-9fbf-b231a0d3dc06
REPORT RequestId:c4da3406-c829-4f99-9fbf-b231a0d3dc06
    Duration: 1624.78 ms    Billed Duration: 1700 ms    Memory Size: 3008 MB    Max Memory Used: 293 MB

Jupyter Notebook:

import pandas as pd
import os
import boto3
import matplotlib.pyplot as plt
client = boto3.client('s3')

path = 's3://test-boto/aws-costs-Owner-Month-08.csv'
monthly_owner = pd.read_csv(path)
plt.bar(monthly_owner.Owner.head(6),monthly_owner.Amount.head(6))
plt.xlabel('Owner', fontsize=15)
plt.ylabel('Amount', fontsize=15)
plt.title('AWS Monthly Cost by Owner')
plt.show()
...