В данных отсутствуют значения, представленные ?
, поэтому для их преобразования можно использовать параметр na_values
, а также параметр names
в read_csv
добавить столбцы по списку, поэтому назначение ненеобходимо:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
headers = ["symboling", "normalized-losses", "make", "fuel-type", "aspiration",
"num-of-doors", "body-style", "drive-wheels", "engine-location",
"wheel-base", "length", "width", "height", "curb-weight",
"engine-type", "num-of-cylinders", "engine-size", "fuel-system",
"bore", "stroke", "compression-ratio", "hoursepower", "peak-rpm",
"city-mpg", "highway-mpg", "price"]
df = pd.read_csv(url, header=None, names=headers, na_values='?')
print(df.head(10))
symboling normalized-losses make fuel-type aspiration \
0 3 NaN alfa-romero gas std
1 3 NaN alfa-romero gas std
2 1 NaN alfa-romero gas std
3 2 164.0 audi gas std
4 2 164.0 audi gas std
5 2 NaN audi gas std
6 1 158.0 audi gas std
7 1 NaN audi gas std
8 1 158.0 audi gas turbo
9 0 NaN audi gas turbo
num-of-doors body-style drive-wheels engine-location wheel-base ... \
0 two convertible rwd front 88.6 ...
1 two convertible rwd front 88.6 ...
2 two hatchback rwd front 94.5 ...
3 four sedan fwd front 99.8 ...
4 four sedan 4wd front 99.4 ...
5 two sedan fwd front 99.8 ...
6 four sedan fwd front 105.8 ...
7 four wagon fwd front 105.8 ...
8 four sedan fwd front 105.8 ...
9 two hatchback 4wd front 99.5 ...
engine-size fuel-system bore stroke compression-ratio hoursepower \
0 130 mpfi 3.47 2.68 9.0 111.0
1 130 mpfi 3.47 2.68 9.0 111.0
2 152 mpfi 2.68 3.47 9.0 154.0
3 109 mpfi 3.19 3.40 10.0 102.0
4 136 mpfi 3.19 3.40 8.0 115.0
5 136 mpfi 3.19 3.40 8.5 110.0
6 136 mpfi 3.19 3.40 8.5 110.0
7 136 mpfi 3.19 3.40 8.5 110.0
8 131 mpfi 3.13 3.40 8.3 140.0
9 131 mpfi 3.13 3.40 7.0 160.0
peak-rpm city-mpg highway-mpg price
0 5000.0 21 27 13495.0
1 5000.0 21 27 16500.0
2 5000.0 19 26 16500.0
3 5500.0 24 30 13950.0
4 5500.0 18 22 17450.0
5 5500.0 19 25 15250.0
6 5500.0 19 25 17710.0
7 5500.0 19 25 18920.0
8 5500.0 17 20 23875.0
9 5500.0 16 22 NaN
[10 rows x 26 columns]
Эта информация здесь:
https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names:
Отсутствующие значения атрибута: (обозначается "?")