PostgreSQL на героку и Python не работает правильно - PullRequest
0 голосов
/ 25 марта 2020

Я новичок в разработке программного обеспечения, и я впервые внедряю многофайловую систему.

На моей локальной машине она работает отлично, но я не знаю, что я делаю неправильно, когда я ' Я его развернул.

Я пытаюсь развернуть потоковую передачу Twitter с использованием Postgresql аддона Heroku, и у моего веб-приложения есть два разных файла:

  • streaming.py ( подключитесь в Twitter и используйте slistener.py для сбора данных и сохранения в PostgreSQL)
  • app.py (прочитайте данные PostgreSQL и составьте несколько диаграмм )

Я объявил свой Procfile как:

worker: python streaming.py
web: gunicorn app:server

И, очевидно, он распознается правильно:

enter image description here

Итак, мои приложения создают соединение с Heroku PostgreSQL, но данные не сохраняются, а также, таблица не создается, поэтому мой app.py не может получить доступ к чему-либо, и это ниже приведена ошибка:

2020-03-25T15:28:22.376679+00:00 app[web.1]: 10.43.182.207 - - [25/Mar/2020:15:28:22 +0000] "POST /_dash-update-component HTTP/1.1" 500 290 "https://bbb-twitter-monitor.herokuapp.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
2020-03-25T15:28:22.499681+00:00 app[web.1]: 10.16.194.154 - - [25/Mar/2020:15:28:22 +0000] "GET /_dash-component-suites/dash_core_components/async-plotlyjs.v1_8_1m1582838719.js HTTP/1.1" 200 984008 "https://bbb-twitter-monitor.herokuapp.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
2020-03-25T15:28:22.524838+00:00 heroku[router]: at=info method=GET path="/_dash-component-suites/dash_core_components/async-plotlyjs.v1_8_1m1582838719.js" host=bbb-twitter-monitor.herokuapp.com request_id=e1723441-0877-4eb1-b9d7-4c94dd0e2432 fwd="177.144.188.23" dyno=web.1 connect=1ms service=199ms status=200 bytes=984265 protocol=https
2020-03-25T15:28:32.951480+00:00 heroku[router]: at=info method=POST path="/_dash-update-component" host=bbb-twitter-monitor.herokuapp.com request_id=4b142c35-a700-4145-a475-877c92bb43e5 fwd="177.144.188.23" dyno=web.1 connect=1ms service=6ms status=500 bytes=470 protocol=https
2020-03-25T15:28:32.949989+00:00 app[web.1]: <connection object at 0x7f939478a3f0; dsn: 'user=papziqledxhges password=xxx dbname=db9vikoson7vl3 host=ec2-52-87-58-157.compute-1.amazonaws.com port=5432 sslmode=require', closed: 0>
2020-03-25T15:28:32.953543+00:00 app[web.1]: Exception on /_dash-update-component [POST]
2020-03-25T15:28:32.953544+00:00 app[web.1]: Traceback (most recent call last):
2020-03-25T15:28:32.953545+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/sql.py", line 1586, in execute
2020-03-25T15:28:32.953546+00:00 app[web.1]:     cur.execute(*args, **kwargs)
2020-03-25T15:28:32.953546+00:00 app[web.1]: psycopg2.errors.UndefinedTable: relation "tweet" does not exist
2020-03-25T15:28:32.953547+00:00 app[web.1]: LINE 1: SELECT * from tweet
2020-03-25T15:28:32.953547+00:00 app[web.1]:                       ^
2020-03-25T15:28:32.953548+00:00 app[web.1]: 
2020-03-25T15:28:32.953548+00:00 app[web.1]: 
2020-03-25T15:28:32.953549+00:00 app[web.1]: The above exception was the direct cause of the following exception:
2020-03-25T15:28:32.953549+00:00 app[web.1]: 
2020-03-25T15:28:32.953549+00:00 app[web.1]: Traceback (most recent call last):
2020-03-25T15:28:32.953550+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
2020-03-25T15:28:32.953550+00:00 app[web.1]:     response = self.full_dispatch_request()
2020-03-25T15:28:32.953551+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
2020-03-25T15:28:32.953551+00:00 app[web.1]:     rv = self.handle_user_exception(e)
2020-03-25T15:28:32.953551+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1820, in handle_user_exception
2020-03-25T15:28:32.953552+00:00 app[web.1]:     reraise(exc_type, exc_value, tb)
2020-03-25T15:28:32.953552+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
2020-03-25T15:28:32.953552+00:00 app[web.1]:     raise value
2020-03-25T15:28:32.953553+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request
2020-03-25T15:28:32.953555+00:00 app[web.1]:     rv = self.dispatch_request()
2020-03-25T15:28:32.953555+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request
2020-03-25T15:28:32.953556+00:00 app[web.1]:     return self.view_functions[rule.endpoint](**req.view_args)
2020-03-25T15:28:32.953556+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/dash/dash.py", line 1461, in dispatch
2020-03-25T15:28:32.953557+00:00 app[web.1]:     response.set_data(self.callback_map[output]["callback"](*args))
2020-03-25T15:28:32.953557+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/dash/dash.py", line 1341, in add_context
2020-03-25T15:28:32.953558+00:00 app[web.1]:     output_value = func(*args, **kwargs)  # %% callback invoked %%
2020-03-25T15:28:32.953558+00:00 app[web.1]:   File "/app/app.py", line 437, in _update_div1
2020-03-25T15:28:32.953559+00:00 app[web.1]:     df = pd.read_sql_query("SELECT * from tweet", con)
2020-03-25T15:28:32.953559+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/sql.py", line 332, in read_sql_query
2020-03-25T15:28:32.953560+00:00 app[web.1]:     chunksize=chunksize,
2020-03-25T15:28:32.953560+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/sql.py", line 1633, in read_query
2020-03-25T15:28:32.953561+00:00 app[web.1]:     cursor = self.execute(*args)
2020-03-25T15:28:32.953561+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/sql.py", line 1598, in execute
2020-03-25T15:28:32.953561+00:00 app[web.1]:     raise ex from exc
2020-03-25T15:28:32.953562+00:00 app[web.1]: pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * from tweet': relation "tweet" does not exist
2020-03-25T15:28:32.953562+00:00 app[web.1]: LINE 1: SELECT * from tweet
2020-03-25T15:28:32.953563+00:00 app[web.1]:                       ^

PostgreSQL показывает, что приложение может обращаться к БД, но не хранит строки и не создает таблицу.

2 connections Postgresql

Я пытаюсь это исправить и не могу найти четких ссылок, которые объясняют, что мне делать.

Под тремя файлами:

streaming.py:

from tweepy import OAuthHandler
from tweepy import API
from tweepy import Stream
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
from urllib3.exceptions import ProtocolError
from slistener import SListener
import os
# from key_secret import consumer_key, consumer_secret
# from key_secret import access_token, access_token_secret

api_key = ''
key_secret = ''

access_token = ''
token_secret = ''


# consumer key authentication
auth = OAuthHandler(api_key, key_secret)
# access key authentication
auth.set_access_token(access_token, token_secret)
# set up the API with the authentication handler
api = API(auth)
# instantiate the SListener object
listen = SListener(api)
# instantiate the stream object
stream = Stream(auth, listen)
# set up words to hear
keywords_to_hear = ['#BBB20', "#BBB2020"]

# create a engine to the database
engine = create_engine(os.environ['DATABASE_URL'])

# if the database does not exist
if not database_exists(engine.url):
    # create a new database
    create_database(engine.url)

# begin collecting data
while True:
    # maintian connection unless interrupted
    try:
        stream.filter(track=keywords_to_hear)
    # reconnect automantically if error arise
    # due to unstable network connection
    except (ProtocolError, AttributeError):
        continue

slistener.py

from tweepy.streaming import StreamListener
import json
import pandas as pd
from sqlalchemy import create_engine
from datetime import timedelta
import os
from sqlalchemy import text
import datetime
from sqlalchemy import text

DATABASE_URL = os.environ['DATABASE_URL']

# inherit from StreamListener class
class SListener(StreamListener):
    # initialize the API and a counter for the number of tweets collected
    def __init__(self, api = None, fprefix = 'streamer'):
        self.api = api or API()
        # instantiate a counter
        self.cnt = 0
        # create a engine to the database
        self.engine = create_engine(os.environ['DATABASE_URL'])

    # for each tweet streamed
    def on_status(self, status): 
        # increment the counter
        self.cnt += 1
        # parse the status object into JSON
        status_json = json.dumps(status._json)
        # convert the JSON string into dictionary
        status_data = json.loads(status_json)   

        tweet = {
            'created_at': status_data['created_at'],
            'tweet_id': status_data['id_str'],
            'id_user': status_data['user']['screen_name'],
            'text': status_data['text']}

        df = pd.DataFrame(tweet, index=[0])

        #print("df")
        from datetime import timedelta
        # convert string of time into date time obejct
        df['created_at'] = pd.to_datetime(df.created_at) 

        # push tweet into database
        df.to_sql('tweet', con=self.engine, if_exists='append', index=False)

        task = """
                DELETE FROM tweet
                WHERE created_at IN(
                    SELECT created_at
                        FROM(
                            SELECT created_at
                            FROM tweet
                            WHERE ((DATE_PART('day', now()::timestamp - created_at::timestamp) * 24 
                                                + DATE_PART('hour', now()::timestamp - created_at::timestamp)) * 60 
                                                + DATE_PART('minute', now()::timestamp - created_at::timestamp)) * 60 
                                                + DATE_PART('second', now()::timestamp - created_at::timestamp) > 360) AS tweet_del) """


        # d = addresses_table.delete().where(addresses_table.c.retired == 1)
        # d.execute()
        with self.engine.connect() as con:
            # con.execute(task)
            con.execute(text(task))

app.py (это не полный код, а только попытка подключения и чтения.

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Output, Input, State
import dash_table
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
from urllib3.exceptions import ProtocolError
import plotly_express as px
import os
import psycopg2


DATABASE_URL = os.environ['DATABASE_URL']

con = psycopg2.connect(DATABASE_URL)

df = pd.read_sql_query("SELECT * from tweet", con)



if __name__ == '__main__':
    app.run_server(debug=True)
...