ключ pd.rename KeyError: 'New_Name' - PullRequest
0 голосов
/ 07 июля 2019

Редактировать 12/07/19: Проблема была на самом деле не в функции pd.rename, а в том, что я не вернул функцию pandas dataframe, и в результате изменение столбца не произошло. существуют при печати. т.е.

def change_column_names(as_pandas, old_name, new_name):
    as_pandas.rename(columns={old_name: new_name}, inplace=)
    return as_pandas <- This was missing*
  • Пожалуйста, ознакомьтесь с комментариями пользователей ниже, чтобы узнать, как найти эту ошибку для меня.

Кроме того, вы можете продолжить чтение.

Данные можно загрузить с по этой ссылке , но я добавил образец набора данных. Форматирование файла не является типичным CSV-файлом, и я полагаю, что это, возможно, была оценочная часть и связана со статьей Hidden Decision Tree . Я дал часть кода, поскольку он решает проблемы, связанные с форматом текстового файла, как упомянуто выше, и позволяет пользователю переименовать столбец.

Проблема возникла, когда я попытался назначить создание функции переименования:

def change_column_names(as_pandas, old_name, new_name):
    as_pandas.rename(columns={old_name: new_name}, inplace=)  

Однако, кажется, что это работает, когда я устанавливаю имена переменных внутри функции переименования.

def change_column_names(as_pandas):
    as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
    return as_pandas

Пример набора данных

Title   URL Date    Unique Pageviews
oupUrl=tutorials    18-Apr-15   5608
"An Exclusive Interview with Data Expert, John Bottega" http://www.datasciencecentral.com/forum/topics/an-exclusive-interview-with-data-expert-john-bottega?groupUrl=announcements  10-Jun-14   360
Announcing Composable Analytics http://www.datasciencecentral.com/forum/topics/announcing-composable-analytics  15-Jun-14   367
Announcing the release of Spark 1.5 http://www.datasciencecentral.com/forum/topics/announcing-the-release-of-spark-1-5  12-Sep-15   156
Are Extreme Weather Events More Frequent? The Data Science Answer   http://www.datasciencecentral.com/forum/topics/are-extreme-weather-events-more-frequent-the-data-science-answer 5-Oct-15    204
Are you interested in joining the University of California for an empiricalstudy on 'Big Data'? http://www.datasciencecentral.com/forum/topics/are-you-interested-in-joining-the-university-of-california-for-an    7-Feb-13    204
Are you smart enough to work at Google? http://www.datasciencecentral.com/forum/topics/are-you-smart-enough-to-work-at-google   11-Oct-15   3625
"As a software engineer, what's the best skill set to have for the next 5-10years?" http://www.datasciencecentral.com/forum/topics/as-a-software-engineer-what-s-the-best-skill-set-to-have-for-the-    12-Feb-16   2815
A Statistician's View on Big Data and Data Science (Updated)    http://www.datasciencecentral.com/forum/topics/a-statistician-s-view-on-big-data-and-data-science-updated-1 21-May-14   163
A synthetic variance designed for Hadoop and big data   http://www.datasciencecentral.com/forum/topics/a-synthetic-variance-designed-for-hadoop-and-big-data?groupUrl=research  26-May-14   575
A Tough Calculus Question   http://www.datasciencecentral.com/forum/topics/a-tough-calculus-question    10-Feb-16   937
Attribution Modeling: Key Analytical Strategy to Boost Marketing ROI    http://www.datasciencecentral.com/forum/topics/attribution-modeling-key-concept 24-Oct-15   937
Audience expansion  http://www.datasciencecentral.com/forum/topics/audience-expansion   6-May-13    223
Automatic use of insights   http://www.datasciencecentral.com/forum/topics/automatic-use-of-insights    27-Aug-15   122
Average length of dissertations by higher education discipline. http://www.datasciencecentral.com/forum/topics/average-length-of-dissertations-by-higher-education-discipline   4-Jun-15    1303

Это полный код, который выдает ошибку ключа: def change_column_names (as_pandas): as_pandas.rename (columns = {'Уникальные просмотры страниц': 'Page_Views'}, inplace = True)

def change_column_names(as_pandas, old_name, new_name):
    as_pandas.rename(columns={old_name: new_name}, inplace=True)


def change_column_names(as_pandas):
    as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, 
                               inplace=True)


def open_as_dataframe(file_name_in):
    reader = pd.read_csv(file_name_in, encoding='windows-1251')
    return reader


# Get each column of data including the heading and separate each element 
i.e. Title, URL, Date, Page Views
# and save to string_of_rows with comma separator for storage as a csv 
# file.
def get_columns_of_data(*args):
    # Function that accept variable length arguments
    string_of_rows = str()
    num_cols = len(args)
    try:
        if num_cols > 0:
            for number, element in enumerate(args):
                if number == (num_cols - 1):
                    string_of_rows = string_of_rows + element + '\n'
                else:
                    string_of_rows = string_of_rows + element + ','

    except UnboundLocalError:
        print('Empty file \'or\' No arguments received, cannot be zero')
    return string_of_rows


def open_file(file_name):
    try:
        with open(file_name) as csv_file_in, open('HDT_data5.txt', 'w') as csv_file_out:
            csv_read = csv.reader(csv_file_in,   delimiter='\t')
            for row in csv_read:
                try:
                    row[0] = row[0].replace(',', '')
                    csv_file_out.write(get_columns_of_data(*row))
                except TypeError:
                    continue

        print("The file name '{}' was successfully opened and read".format(file_name))
    except IOError:
        print('File not found \'OR\' Not in current directory\n')



# All acronyms used in variable naming correspond to the function at time 
# of return from function.
# csv_list being a list of the v file contents the remainder i.e. 'st' of 
# csv_list_st = split_title().
def main():
    open_file('HDTdata3.txt')
    multi_sets = open_as_dataframe('HDT_data5.txt')
    # change_column_names(multi_sets)
    change_column_names(multi_set, 'Old_Name', 'New_Name')
    print(multi_sets)


    main()

1 Ответ

1 голос
/ 07 июля 2019

Я очистил ваш код, чтобы он запустился.Вы меняли имена столбцов, но не возвращали результат.Попробуйте следующее:

import pandas as pd
import numpy as np
import math

def set_new_columns(as_pandas):
    titles_list = ['Year > 2014', 'Forum', 'Blog', 'Python', 'R',
                   'Machine_Learning', 'Data_Science', 'Data', 
                   'Analytics']
    for number, word in enumerate(titles_list):
        as_pandas.insert(len(as_pandas.columns), titles_list[number], 0)

def title_length(as_pandas):
    # Insert new column header then count the number of letters in 'Title'
    as_pandas.insert(len(as_pandas.columns), 'Title_Length', 0)
    as_pandas['Title_Length'] = as_pandas['Title'].map(str).apply(len)

# Although it is log, percentage of change is inverse linear comparison of 
#logX1 - logX2
# therefore you could think of it as the percentage change in Page Views 
# map
# function allows for function to be performed on all rows in column 
# 'Page_Views'.
def log_page_view(as_pandas):
    # Insert new column header
    as_pandas.insert(len(as_pandas.columns), 'Log_Page_Views', 0)
    as_pandas['Log_Page_Views'] = as_pandas['Page_Views'].map(lambda x: math.log(1 + float(x)))

def change_to_numeric(as_pandas):
    # Check for missing values then convert the column to numeric.
    as_pandas = as_pandas.replace(r'^\s*$', np.nan, regex=True)
    as_pandas['Page_Views'] = pd.to_numeric(as_pandas['Page_Views'],
                                        errors='coerce')

def change_column_names(as_pandas):
    as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
    return as_pandas

def open_as_dataframe(file_name_in):
    reader = pd.read_csv(file_name_in, encoding='windows-1251')
    return reader

# Get each column of data including the heading and separate each element 
# i.e. Title, URL, Date, Page Views
# and save to string_of_rows with comma separator for storage as a csv 
# file.
def get_columns_of_data(*args):
    # Function that accept variable length arguments
    string_of_rows = str()
    num_cols = len(args)
    try:
        if num_cols > 0:
            for number, element in enumerate(args):
                if number == (num_cols - 1):
                    string_of_rows = string_of_rows + element + '\n'
                else:
                    string_of_rows = string_of_rows + element + ','

    except UnboundLocalError:
        print('Empty file \'or\' No arguments received, cannot be zero')
    return string_of_rows

def open_file(file_name):
    import csv
    try:
        with open(file_name) as csv_file_in, open('HDT_data5.txt', 'w') as csv_file_out:
            csv_read = csv.reader(csv_file_in,   delimiter='\t')
            for row in csv_read:
                try:
                    row[0] = row[0].replace(',', '')
                    csv_file_out.write(get_columns_of_data(*row))
                except TypeError:
                    continue

        print("The file name '{}' was successfully opened and read".format(file_name))
    except IOError:
        print('File not found \'OR\' Not in current directory\n')

# All acronyms used in variable naming correspond to the function at time 
# of return from function.
# csv_list being a list of the v file contents the remainder i.e. 'st' of 
# csv_list_st = split_title().
def main():
    open_file('HDTdata3.txt')
    multi_sets = open_as_dataframe('HDT_data5.txt')
    multi_sets = change_column_names(multi_sets)
    change_to_numeric(multi_sets)
    log_page_view(multi_sets)
    title_length(multi_sets)
    set_new_columns(multi_sets)
    print(multi_sets)


main()
...