Вы можете добавить параметр names
к read_csv
для новых имен столбцов - затем получить несколько строк с пропущенными значениями, поэтому добавлено DataFrame.dropna
:
import pandas as pd
from io import StringIO
temp="""The pch2csv utility program
This file contains the pch2csv
$TITLE =
$SUBTITLE=
$LABEL = FX
1,0.000000E+00,3.792830E-06,-1.063093E-06
2,0.000000E+00,-1.441319E-06,4.711234E-06
3,0.000000E+00,2.950290E-06,-5.669502E-07
4,0.000000E+00,3.706791E-06,-1.094726E-06
5,0.000000E+00,3.689831E-06,-1.107476E-06
$TITLE =
$SUBTITLE=
$LABEL = FY
1,0.000000E+00,-5.878803E-06,1.127179E-06
2,0.000000E+00,2.782207E-06,-8.840886E-06
3,0.000000E+00,-1.574296E-06,3.867732E-07
4,0.000000E+00,-6.227912E-06,1.864081E-06
5,0.000000E+00,-3.113227E-05,9.339538E-06"""
#after testing replace 'pd.compat.StringIO(temp)' to 'Dataset.csv'
df = pd.read_csv(StringIO(temp),
error_bad_lines=False,
engine='python',
names=['a','b','c','d'])
df = df.dropna(subset=['b','c','d'])
print (df)
a b c d
0 1 0.0 0.000004 -1.063093e-06
1 2 0.0 -0.000001 4.711234e-06
2 3 0.0 0.000003 -5.669502e-07
3 4 0.0 0.000004 -1.094726e-06
4 5 0.0 0.000004 -1.107476e-06
8 1 0.0 -0.000006 1.127179e-06
9 2 0.0 0.000003 -8.840886e-06
10 3 0.0 -0.000002 3.867732e-07
11 4 0.0 -0.000006 1.864081e-06
12 5 0.0 -0.000031 9.339538e-06
РЕДАКТИРОВАТЬ:
Для задания первого столбца для индекса и имена других столбцов:
#after testing replace 'pd.compat.StringIO(temp)' to 'Dataset.csv'
df = pd.read_csv(StringIO(temp),
error_bad_lines=False,
engine='python',
index_col=[0],
names=['idx','col1','col2','col3'])
#check all columns, first column is set to index, so not tested
df = df.dropna()
#if need test if all values in row has NaNs
#df = df.dropna(how='all')
print (df)
col1 col2 col3
idx
1 0.0 0.000004 -1.063093e-06
2 0.0 -0.000001 4.711234e-06
3 0.0 0.000003 -5.669502e-07
4 0.0 0.000004 -1.094726e-06
5 0.0 0.000004 -1.107476e-06
1 0.0 -0.000006 1.127179e-06
2 0.0 0.000003 -8.840886e-06
3 0.0 -0.000002 3.867732e-07
4 0.0 -0.000006 1.864081e-06
5 0.0 -0.000031 9.339538e-06
EDIT1:
При необходимости удалить всестолбцы, заполненные только 0
:
df = df.loc[:, df.ne(0).any()]
print (df)
col2 col3
idx
1 0.000004 -1.063093e-06
2 -0.000001 4.711234e-06
3 0.000003 -5.669502e-07
4 0.000004 -1.094726e-06
5 0.000004 -1.107476e-06
1 -0.000006 1.127179e-06
2 0.000003 -8.840886e-06
3 -0.000002 3.867732e-07
4 -0.000006 1.864081e-06
5 -0.000031 9.339538e-06