Любой способ правильно агрегировать данные временных рядов, чтобы построить диаграмму рассеяния с помощью matplotlib / seaborn? - PullRequest
2 голосов
/ 04 августа 2020

Я хочу создать диаграмму разброса временных рядов для моих данных временных рядов, где мои данные имеют категориальные столбцы, которые необходимо агрегировать по группам, чтобы сначала построить данные, а затем сделать диаграмму разброса с использованием seaborn или matplotlib. Мои данные - это данные временных рядов продажных цен на продукты, я хочу видеть тенденцию цен каждого владельца продукта на разных рыночных порогах с течением времени. Я попытался использовать pandas.pivot_table, groupby для формирования данных построения графика, но не смог получить желаемый график, который я хочу создать.

воспроизводимые данные :

вот пример данных продукта , который я использовал; где я хочу увидеть динамику цен каждого дилера по разному типу протеина в отношении threshold.

моя попытка

вот моя текущая попытка агрегировать мои данные для создание данных для построения графиков, но это не дает моего правильного графика. Бьюсь об заклад, мой способ агрегирования данных графика неверен. Может ли кто-нибудь указать мне, как сделать это правильно, чтобы получить желаемый участок?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn

mydf = pd.read_csv('foo.csv')
mydf=mydf.drop(mydf.columns[0], axis=1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

g = mydf.groupby(['dealer','protein_type'])
newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()

, но попытка выше не работает, потому что я хочу иметь данные графика для рыночной закупочной цены каждого дилера на разных protein_type с разными threshold по дневному временному ряду. Я не знаю, как лучше всего справиться с этим временным рядом. Может ли кто-нибудь предложить мне или исправить меня, как сделать это правильно?

Я также пробовал pandas/pivot_table для агрегирования моих данных, но он все еще не представляет данные для построения графиков.

pv_df= pd.pivot_table(mydf, index=['date'], columns=['dealer', 'protein_type', 'threshold'],values=['price'])
pv_df= pv_df.fillna(0)
pv_df.groupby(['dealer', 'protein_type', 'threshold'])['price'].unstack().reset_index()

но попытка выше все еще не работает. Кроме того, в моих данных дата не является непрерывной, поэтому я предполагаю, что могу построить график ежемесячного графика временного ряда.

моя попытка построить график :

вот мой попытка построения графика:

def scatterplot(x_data, y_data, x_label, y_label, title):
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, s = 30, color = '#539caf', alpha = 0.75)

    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)
    fig.autofmt_xdate()

желаемый результат :

Я хочу либо линейную диаграмму, либо диаграмму рассеяния, где ось x показывает месячный временной ряд, а ось y показывает цена каждого разного protein_type на разное threshold значение для каждого отдельного дилера по месячным временным рядам. Вот пример возможной линейной диаграммы, которую я хочу иметь:

пример линейной диаграммы

Ответы [ 2 ]

5 голосов
/ 04 августа 2020

Обновлено с помощью threshold

Вариант 1

  • Этот вариант был реализован после просмотра результатов Варианта 1 .
    • На графиках много необъяснимой информации, и они не четко представляют данные
  • Чтобы четко представить данные, каждый график должен содержать только 3 измерения данные (например, date, values и cats) для одного dealer, одного threshold и одного protein_type.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns={'level_4': 'cats', 0: 'values'})

# plot
for pt in dfl.protein_type.unique():
    for t in dfl.threshold.unique():
        data = dfl[(dfl.protein_type == pt) & (dfl.threshold == t)]
        if not data.empty:
            utc = len(data.threshold.unique())
            f, axes = plt.subplots(nrows=utc, ncols= 2, figsize=(20, 4), squeeze=False)
            for j in range(utc):
                for i, d in enumerate(dfl.dealer.unique()):
                    data_d = data[data.dealer == d].sort_values(['cats', 'date']).reset_index(drop=True)
                    p = sns.scatterplot('date', 'values', data=data_d, hue='cats', ax=axes[j, i])
                    if not data_d.empty:
                        p.set_title(f'{d}\nThreshold: {t}\n{pt}')
                        p.set_xlim(data_d.date.min() - timedelta(days=60), data_d.date.max() + timedelta(days=60))
                    else:
                        p.set_title(f'{d}: No Data Available\nThreshold: {t}\n{pt}')
                    
            plt.show()

Первые четыре графика

enter image description here

Option 2

  • This results in 4 separate figures with threshold as a category type.
  • threshold must first be left as an int for the expected_price calculation, and then converted.
  • Note that my data does not have the extra unnamed column, so that will still need to be dropped, which is not shown in the following code.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns={'level_4': 'cats', 0: 'values'})

# plot four plots with threshold
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(13, 7))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='threshold', style='cats')
        plt.yscale('log')
        plt.title(f'{d}: {pt}')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

enter image description here enter image description here

Original without threshold as a category

  • I don't understand what you're doing with the following:
    • newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()
    • I don't think this is integral to the main issue, which is plotting the data
  • First, the dataframe needs to be converted to a long format and 'destination' needs to be dropped
  • There are to many dimensions to plot on a single figure
    • x='date', y='values', hue='cats', style='dealer'
    • 'protein_type' needs to have a separate figure
    • However, the data overlaps to much to be readable with 'dealer' included, so 4 plots are required.

DataFrame Setup:

  • Note that my data does not have the extra unnamed column, so that will still need to be dropped, which is not shown in the following code.
  • Use pandas.DataFrame.stack для преобразования фрейма данных в длинную форму

Вариант 1:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# your calculation
df['expected_price'] = df['price']*76/df['threshold']

# set the index
df = df.set_index(['date', 'dealer', 'protein_type'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination']).stack().reset_index().rename(columns={'level_3': 'cats', 0: 'values'})

# display(dfl.head())
        date            dealer protein_type            cats    values
0 2001-12-22  Alpha Food Corps      chicken       threshold     50.00
1 2001-12-22  Alpha Food Corps      chicken        quantity  39037.00
2 2001-12-22  Alpha Food Corps      chicken           price      0.50
3 2001-12-22  Alpha Food Corps      chicken  expected_price      0.76
4 2001-12-27  Alpha Food Corps         beef       threshold     85.00

Вариант 2: Скользящее среднее

df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])
df['expected_price'] = df['price']*76/df['threshold']
df = df.set_index('date')

# groupby aggregate rolling mean and stack
dfl = df.groupby(['dealer', 'protein_type'])[['expected_price', 'price']].rolling(7).mean().stack().reset_index().rename(columns={'level_3': 'cats', 0: 'values'})

Вариант 1: два графика

  • Данные 'dealer' похожи на дифференциацию (кто-нибудь ценовой сговор?)
for pt in dfl.protein_type.unique():
    plt.figure(figsize=(9, 5))
    data = dfl[dfl.protein_type == pt]
    sns.lineplot('date', 'values', data=data, hue='cats', style='dealer')
    plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
    plt.yscale('log')
    plt.title(pt)
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

enter image description here

  • Even with only 'price' and 'expected_price', 'dealer' can't be determined.

enter image description here

Option 2: Four Plots

seaborn.FacetGrid

g = sns.FacetGrid(data=dfl, col='dealer', row='protein_type', hue='cats', height=5, aspect=1.5)
g.map(sns.lineplot, 'date', 'values').add_legend()
plt.yscale('log')
g.set_xticklabels(rotation=90)

enter image description here

  • Plot of data from rolling mean

enter image description here

Nested Loop

  • This will product one column of 4 figures, selected first for dealer and then protein_type.
  • Optionally, swap the order of dealer and protein
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(10, 5))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='cats')
        plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
        plt.yscale('log')
        plt.title(f'{d}: {pt}')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

Пример CSV:

date,dealer,threshold,quantity,price,protein_type,destination
2001-12-22,Alpha Food Corps,50,39037,0.5,chicken,UK
2001-12-27,Alpha Food Corps,85,35432,1.8,beef,UK
2001-12-29,Alpha Food Corps,50,32142,0.5,chicken,UK
2001-12-30,Alpha Food Corps,85,34516,1.8,beef,UK
2002-01-02,Alpha Food Corps,85,39930,1.8,beef,UK
2002-01-04,Alpha Food Corps,85,40709,1.8,beef,UK
2002-01-08,Alpha Food Corps,94,37641,2.2,beef,UK
2002-01-08,Alpha Food Corps,85,37545,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37564,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37607,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,41706,1.8,beef,UK
2002-01-08,Alpha Food Corps,90,41628,2.1,beef,UK
2002-01-08,Alpha Food Corps,65,35720,0.9,chicken,UK
2002-01-09,Alpha Food Corps,94,1581,2.2,beef,UK
2002-01-09,Alpha Food Corps,85,11426,1.8,beef,UK
2002-01-09,Alpha Food Corps,85,37489,1.8,beef,UK
2002-01-09,Alpha Food Corps,90,15630,2.1,beef,UK
2002-01-09,Alpha Food Corps,80,3136,1.6,beef,UK
2002-01-10,Alpha Food Corps,85,41919,1.8,beef,UK
2002-01-10,Alpha Food Corps,90,39932,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41665,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41860,2.1,beef,UK
2002-01-10,Alpha Food Corps,65,39879,0.9,chicken,UK
2002-01-10,Alpha Food Corps,65,39884,0.9,chicken,UK
2002-01-11,Alpha Food Corps,90,37613,2.1,beef,UK
2002-01-12,Alpha Food Corps,90,41855,2.1,beef,UK
2002-01-13,Alpha Food Corps,90,37585,2.1,beef,UK
2002-01-15,Alpha Food Corps,85,41618,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41721,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41869,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41990,1.8,beef,UK
2002-01-15,Alpha Food Corps,90,41744,2.1,beef,UK
2002-01-15,Alpha Food Corps,90,41936,2.1,beef,UK
2002-01-15,Alpha Food Corps,65,41684,1.0,chicken,UK
2002-01-15,Alpha Food Corps,65,41776,1.0,chicken,UK
2002-01-16,Alpha Food Corps,94,35891,2.2,beef,UK
2002-01-16,Alpha Food Corps,85,39985,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41754,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41811,1.8,beef,UK
2002-01-16,Alpha Food Corps,90,39838,2.1,beef,UK
2002-01-16,Alpha Food Corps,80,3244,1.7,beef,UK
2002-01-17,Alpha Food Corps,94,22245,2.2,beef,UK
2002-01-17,Alpha Food Corps,85,5186,1.8,beef,UK
2002-01-17,Alpha Food Corps,90,2016,2.1,beef,UK
2002-01-17,Alpha Food Corps,90,40875,2.1,beef,UK
2002-01-17,Alpha Food Corps,65,41440,1.0,chicken,UK
2002-01-18,Alpha Food Corps,94,12525,2.2,beef,UK
2002-01-18,Alpha Food Corps,94,31325,2.2,beef,UK
2002-01-18,Alpha Food Corps,85,15486,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,29992,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,39938,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,41777,1.8,beef,UK
2002-01-18,Alpha Food Corps,90,9475,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,9960,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41676,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41816,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,42036,2.1,beef,UK
2002-01-18,Alpha Food Corps,65,41673,1.0,chicken,UK
2002-01-19,Alpha Food Corps,85,19961,1.8,beef,UK
2002-01-19,Alpha Food Corps,90,19955,2.1,beef,UK
2002-01-19,Alpha Food Corps,90,40437,2.1,beef,UK
2002-01-19,Alpha Food Corps,65,41574,1.0,chicken,UK
2002-01-19,Alpha Food Corps,65,41700,1.0,chicken,UK
2002-01-20,Alpha Food Corps,94,23278,2.2,beef,UK
2002-01-20,Alpha Food Corps,85,9230,1.8,beef,UK
2002-01-20,Alpha Food Corps,85,38842,1.8,beef,UK
2002-01-20,Alpha Food Corps,90,9173,2.1,beef,UK
2002-01-20,Alpha Food Corps,90,38608,2.1,beef,UK
2002-01-20,Alpha Food Corps,50,39191,0.8,chicken,UK
2002-01-22,Alpha Food Corps,94,41741,2.2,beef,UK
2002-01-22,Alpha Food Corps,85,39879,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41683,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41958,1.8,beef,UK
2002-01-22,Alpha Food Corps,90,41833,2.1,beef,UK
2002-01-23,Alpha Food Corps,94,20294,2.2,beef,UK
2002-01-23,Alpha Food Corps,85,15553,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,40753,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,41740,1.8,beef,UK
2002-01-23,Alpha Food Corps,90,1892,2.1,beef,UK
2002-01-23,Alpha Food Corps,90,39850,2.1,beef,UK
2002-01-23,Alpha Food Corps,80,3231,1.7,beef,UK
2002-01-23,Alpha Food Corps,65,41415,1.1,chicken,UK
2002-01-24,Alpha Food Corps,90,35473,2.1,beef,UK
2002-01-24,Alpha Food Corps,90,41824,2.1,beef,UK
2002-01-24,Alpha Food Corps,65,41721,1.1,chicken,UK
2002-01-25,Alpha Food Corps,85,19983,1.8,beef,UK
2002-01-25,Alpha Food Corps,85,35823,1.8,beef,UK
2002-01-25,Alpha Food Corps,90,19949,2.1,beef,UK
2002-01-25,Alpha Food Corps,90,41800,2.1,beef,UK
2002-01-25,Alpha Food Corps,65,40990,1.1,chicken,UK
2002-01-26,Alpha Food Corps,90,39938,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,40641,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,41550,2.1,beef,UK
2002-01-27,Alpha Food Corps,94,16589,2.2,beef,UK
2002-01-27,Alpha Food Corps,85,11669,1.8,beef,UK
2002-01-27,Alpha Food Corps,90,24982,2.1,beef,UK
2002-01-27,Alpha Food Corps,65,29819,1.1,chicken,UK
2002-01-29,Alpha Food Corps,94,37516,2.2,beef,UK
2002-01-29,Alpha Food Corps,85,37378,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,37535,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,40174,1.8,beef,UK
2002-01-29,Alpha Food Corps,90,37831,2.1,beef,UK
2002-01-30,Alpha Food Corps,94,34435,2.2,beef,UK
2002-01-30,Alpha Food Corps,94,39640,2.2,beef,UK
2002-01-30,Alpha Food Corps,85,1619,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,3058,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,20929,1.8,beef,UK
2002-01-30,Alpha Food Corps,90,3641,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,20974,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,31160,2.1,beef,UK
2002-01-30,Alpha Food Corps,92,38189,2.3,beef,UK
2002-01-31,Alpha Food Corps,94,8804,2.2,beef,UK
2002-01-31,Alpha Food Corps,85,17398,1.8,beef,UK
2002-01-31,Alpha Food Corps,90,13963,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,37673,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40330,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40511,2.2,beef,UK
2002-01-31,Alpha Food Corps,80,38290,1.9,beef,UK
2002-01-31,Alpha Food Corps,92,37193,2.3,beef,UK
2002-02-01,Alpha Food Corps,94,5011,2.2,beef,UK
2002-02-01,Alpha Food Corps,85,18783,1.8,beef,UK
2002-02-01,Alpha Food Corps,85,41827,1.8,beef,UK
2002-02-01,Alpha Food Corps,90,16394,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,23013,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,39923,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,41417,2.1,beef,UK
2002-02-01,Alpha Food Corps,80,15592,1.7,beef,UK
2002-02-01,Alpha Food Corps,80,38364,1.9,beef,UK
2002-02-01,Alpha Food Corps,92,37605,2.3,beef,UK
2002-02-01,Alpha Food Corps,92,39234,2.3,beef,UK
2002-02-02,Alpha Food Corps,90,34578,2.1,beef,UK
2002-02-02,Alpha Food Corps,90,41661,2.1,beef,UK
2002-02-02,Alpha Food Corps,80,3157,1.7,beef,UK
2002-02-02,Alpha Food Corps,65,41272,1.2,chicken,UK
2002-02-02,Alpha Food Corps,65,41503,1.2,chicken,UK
2002-02-02,Alpha Food Corps,92,36207,2.3,beef,UK
2002-02-05,Alpha Food Corps,94,41559,2.2,beef,UK
2002-02-05,Alpha Food Corps,85,41549,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41753,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41908,1.8,beef,UK
2002-02-05,Alpha Food Corps,90,39813,2.1,beef,UK
2002-02-05,Alpha Food Corps,90,41526,2.1,beef,UK
2002-02-05,German Food Corps,80,36031,1.9,beef,UK
2002-02-05,German Food Corps,50,38538,0.9,chicken,UK
2002-02-05,Alpha Food Corps,50,38772,0.9,chicken,UK
2002-02-05,German Food Corps,50,39099,0.9,chicken,UK
2002-02-05,German Food Corps,50,39132,0.9,chicken,UK
2002-02-05,German Food Corps,50,39207,0.9,chicken,UK
2002-02-06,Alpha Food Corps,85,41947,1.8,beef,UK
2002-02-06,German Food Corps,80,37287,1.9,beef,UK
2002-02-06,Alpha Food Corps,89,43201,2.1,beef,UK
2002-02-06,German Food Corps,50,38553,0.9,chicken,UK
2002-02-06,German Food Corps,50,38837,0.9,chicken,UK
2002-02-06,Alpha Food Corps,50,38985,0.9,chicken,UK
2002-02-06,German Food Corps,65,40386,1.4,chicken,UK
2002-02-06,Alpha Food Corps,65,41851,1.2,chicken,UK
2002-02-06,Alpha Food Corps,92,38405,2.3,beef,UK
2002-02-06,German Food Corps,73,37731,1.5,chicken,UK
2002-02-07,Alpha Food Corps,85,41097,1.9,beef,UK
2002-02-07,Alpha Food Corps,90,39582,2.1,beef,UK
2002-02-07,German Food Corps,65,38832,1.4,chicken,UK
2002-02-07,German Food Corps,50,39269,0.9,chicken,UK
2002-02-07,German Food Corps,50,40129,0.9,chicken,UK
2002-02-07,German Food Corps,50,41124,0.8,chicken,UK
2002-02-07,German Food Corps,65,41739,1.2,chicken,UK
2002-02-08,Alpha Food Corps,85,20034,1.8,beef,UK
2002-02-08,German Food Corps,85,33503,1.9,beef,UK
2002-02-08,German Food Corps,85,40780,1.9,beef,UK
2002-02-08,Alpha Food Corps,90,19913,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,36682,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,41624,2.1,beef,UK
2002-02-08,German Food Corps,65,37503,1.4,chicken,UK
2002-02-08,German Food Corps,50,38973,0.9,chicken,UK
2002-02-08,German Food Corps,50,39069,0.9,chicken,UK
2002-02-08,German Food Corps,50,40697,0.9,chicken,UK
2002-02-08,German Food Corps,92,36103,2.3,beef,UK
2002-02-08,Alpha Food Corps,92,38278,2.3,beef,UK
2002-02-09,Alpha Food Corps,90,39842,2.1,beef,UK
2002-02-09,Alpha Food Corps,90,16553,2.3,beef,UK
2002-02-09,Alpha Food Corps,80,18739,1.9,beef,UK
2002-02-09,German Food Corps,80,36349,1.9,beef,UK
2002-02-09,German Food Corps,65,35238,1.4,chicken,UK
2002-02-09,German Food Corps,50,38391,0.9,chicken,UK
2002-02-09,Alpha Food Corps,50,38819,0.9,chicken,UK
2002-02-09,German Food Corps,50,41691,0.9,chicken,UK
2002-02-09,Alpha Food Corps,92,40245,2.3,beef,UK
2002-02-09,German Food Corps,73,37323,1.5,chicken,UK
2002-02-09,German Food Corps,90,40312,2.2,beef,UK
2002-02-10,Alpha Food Corps,90,42108,2.1,beef,UK
2002-02-10,German Food Corps,65,37831,1.4,chicken,UK
2002-02-11,Alpha Food Corps,50,38591,0.9,chicken,UK
2002-02-12,Alpha Food Corps,94,41559,2.3,beef,UK
2002-02-12,Alpha Food Corps,85,40968,1.8,beef,UK
2002-02-12,Alpha Food Corps,85,41985,1.8,beef,UK
2002-02-12,German Food Corps,50,38931,0.9,chicken,UK
2002-02-12,German Food Corps,50,38986,0.9,chicken,UK
2002-02-12,German Food Corps,92,39684,2.3,beef,UK
2002-02-12,German Food Corps,73,36619,1.5,chicken,UK
2002-02-13,Alpha Food Corps,85,41291,1.8,beef,UK
2002-02-13,Alpha Food Corps,85,41892,1.8,beef,UK
3 голосов
/ 04 августа 2020

На линейном графике, насколько мне известно, вы можете представить только 4 измерения:

  • x ось, вы можете использовать ее для date
  • y , вы можете использовать его для линии price
  • оттенок , вы можете использовать его для threshold
  • строка стиль , вы можете использовать его для dealer

Но вы хотите учесть 5-е измерение: protein_type. Для этого я предлагаю использовать подзаговор, как в приведенном ниже коде:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = 1,
                       figsize = (10, 10),
                       sharex = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # filter dataframe
    df_filtered = mydf[mydf['protein_type'] == protein_type]

    # set up plot
    sns.lineplot(ax = ax[i],
                 data = df_filtered,
                 x = 'date',
                 y = 'price',
                 hue = 'threshold',
                 style = 'dealer',
                 legend = 'full',
                 ci = False)

    # set up subplot title and legend
    ax[i].set_title(f'Protein type = {protein_type}')
    ax[i].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.85,
                    bottom = 0.05,
                    left = 0.05,
                    hspace = 0.15)

# show the plot
plt.show()

enter image description here


In the above plot could be difficoult to appreciate differences between dealers, so you can separate them in another subplot grid like in the code below:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price',
                     hue = 'threshold',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = {protein_type} | Dealer = {dealer}')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

enter image description here


Finally, if you want to compare price with expected_price, you can use the style dimension for this task.
This requires a different aggragation of the dataframe: you have to stack price and expected_price columns in a unique column. You can do this with the pd.melt method.
Check the code below as a reference:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

# convert 'date' type to datetime
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')

# reshape dataframe
mydf = pd.melt(frame = mydf,
               id_vars = ['date', 'dealer', 'threshold', 'quantity', 'protein_type', 'destination'],
               value_vars = ['price', 'expected_price'],
               var_name = 'price type',
               value_name = 'price value')

# sort values by threshold, then by date
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price value',
                     hue = 'threshold',
                     style = 'price type',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = {protein_type} | Dealer = {dealer}')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

введите описание изображения здесь

...