У меня есть python, который читает файл CSV и преобразует его в фрейм данных, используя pandas, а затем, используя matplotlib, строит гистограмму. первое задание правильное: чтение и запись в файл CSV и из него.
полями файла CSV являются: date "," user_lo c "," message "," full_name "," country "," код страны "," прогнозы "," количество слов "
НО задача построения графика заключается в отображении ошибки ниже.
Ошибка:
--------------------------------------------------------------------------- IndexError Traceback (most recent call
last) <ipython-input-37-5bc3925ff988> in <module>
1 #plot word count distribution for both positive and negative sentiment
----> 2 x= tweet_preds["word count"][tweet_preds.predictions ==1]
3 y= tweet_preds["word count"][tweet_preds.predictions ==0]
4 plt.figure(figsize=(12,6))
5 plt.xlim(0,45)
IndexError: only integers, slices (`:`), ellipsis (`...`),
numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Код:
# create of dataframe:
#create column names
col_names = ["date","user_loc","followers","friends","message","bbox_coords",
"full_name","country","country_code","place_type"]
#read csv
df_twtr = pd.read_csv("F:\AIenv\sentiment_analysis\paul_ryan_twitter.csv",names = col_names)
#check head
df_twtr=df_twtr.dropna()
df_twtr = df_twtr.reset_index(drop=True)
df_twtr.head()
# run predictions on twitter data
tweet_preds = model_NB.predict(df_twtr['message'])
# append predictions to dataframe
df_tweet_preds = df_twtr.copy()
df_tweet_preds['predictions'] = tweet_preds
df_tweet_preds.shape
df_tweet_preds = pd.DataFrame(df_tweet_preds,columns = ["date","user_loc","message","full_name","country","country_code","predictions","word count"])
df_tweet_preds = df_tweet_preds.drop(["user_loc","country","country_code"],axis=1)
df_tweet_preds_to_csv = df_tweet_preds.to_csv(r'F:\AIenv\sentiment_analysis\export_dataframe.csv', index = False, header=True)
#plot word count distribution for both positive and negative sentiment
x= tweet_preds["word count"][tweet_preds.predictions ==1]
y= tweet_preds["word count"][tweet_preds.predictions ==0]
plt.figure(figsize=(12,6))
plt.xlim(0,45)
plt.xlabel("word count")
plt.ylabel("frequency")
g = plt.hist([x,y],color=["r","b"],alpha=0.5,label=["positive","negative"])
plt.legend(loc="upper right")