Эти виды операций проще выполнять с библиотекой, предназначенной для работы с табличными данными, такими как ваша. Pandas является отличным примером, хотя это может быть немного сложным, особенно для тех, кто не имеет большого опыта работы с Python.В любом случае, вот один способ достичь того, чего (я думаю) вы хотите, используя панд.Исключение нулевых значений делает его немного более сложным, поэтому загадочный код:
# -*- coding: utf-8 -*-
# ^This line makes sure python is able to read some weird
# accented characters.
# Importing variaous libraries
import sys
import pandas as pd
import numpy as np
# Depending on your version of python, we need to import
# a different library for reading your input data as a
# string. This step is not required, you should probably
# use the pandas function called read_csv(), if you have
# your file stored locally.
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
input_data = StringIO("""Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10
""")
# Read data, specify that columns are delimited by space,
# using the sep= argument.
df = pd.read_csv(input_data, sep=" ")
# Find all column names contain subject scores, based on their name
# We just pick all columns that starts with the string "subject".
subject_columns = [c for c in df.columns if c.startswith("subject")]
print subject_columns
# Calculate mean score for each subject by finding the sum of all scores
# for each subject, then divide it by the number of data points for each
# subject that does not equal (or is greater than) 0.
for subject in subject_columns:
df["%s_mean" % subject] = float(df[subject].sum()) / float(len(df[subject].loc[df[subject] > 0]))
# Calculate mean for each student, without 0s
# The .replace(0, np.NaN).count(axis=1) is just a trick to find the
# number of non-zero values in each row. In short, it replaces all
# values that are 0 with NaN, so that the count() function ignores
# those values when calculating the number of data points that are
# present in the dataset. I.e. it disregards values that are 0,
# so that they're excluded from the mean calculation.
df["student_mean"] = df[subject_columns].sum(axis=1) / df[subject_columns].replace(0, np.NaN).count(axis=1)
# This just configures pandas to print all columns in our dataset,
# and not truncate the print-out to fit to the screen.
pd.set_option("display.max_columns", 1000)
# Print out our final dataframe.
print df
Окончательный набор данных выглядит следующим образом:
Number Name subject1 subject2 subject3 subject4 subject5 subject1_mean subject2_mean subject3_mean subject4_mean subject5_mean student_mean
0 1234567 Jan 5 7 0 6 4 5.5 6.666667 7.333333 6.857143 6.875 5.500000
1 3526435 Marie 5 5 7 0 0 5.5 6.666667 7.333333 6.857143 6.875 5.666667
2 2230431 Kees 6 10 0 8 6 5.5 6.666667 7.333333 6.857143 6.875 7.500000
3 7685433 André 4 7 8 7 5 5.5 6.666667 7.333333 6.857143 6.875 6.200000
4 364678 Antoinette 0 2 8 8 8 5.5 6.666667 7.333333 6.857143 6.875 6.500000
5 1424354 Jerôme 7 9 0 5 0 5.5 6.666667 7.333333 6.857143 6.875 7.000000
6 4536576 Kamal 8 0 8 7 8 5.5 6.666667 7.333333 6.857143 6.875 7.750000
7 1256033 Diana 0 0 0 0 0 5.5 6.666667 7.333333 6.857143 6.875 NaN
8 5504657 Petra 6 6 7 0 6 5.5 6.666667 7.333333 6.857143 6.875 6.250000
9 9676575 Malika 0 6 0 0 8 5.5 6.666667 7.333333 6.857143 6.875 7.000000
10 253756 Samira 3 8 6 7 10 5.5 6.666667 7.333333 6.857143 6.875 6.800000
Обратите внимание, что вам нужно установить пандМодуль для этого на работу.Вам также понадобится модуль numpy .