Используйте extract
или replace
df['Mileage'] = df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
или
df['Mileage'] = df['Mileage'].str.replace('[^\d.]', '').astype(float)
Вот пример
>>> import pandas as pd
>>> df = pd.DataFrame([
['26.6 km'],
['19.67 km'],
['18.2 km'],
['20.77 km'],
['15.2 km'],
], columns=['Mileage'])
>>> df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
0 26.60
1 19.67
2 18.20
3 20.77
4 15.20
Name: Mileage, dtype: float64
>>> df['Mileage'].str.replace('[^\d.]', '').astype(float)
0 26.60
1 19.67
2 18.20
3 20.77
4 15.20
Name: Mileage, dtype: float64
Или, если вы хотите использовать DataFrameMapper
и FunctionTransformer
из sklearn_pandas
,
from sklearn_pandas import DataFrameMapper, FunctionTransformer
def remove_words(val):
return val.split(' ')[0]
mapper = DataFrameMapper([
('Mileage', [FunctionTransformer(remove_words)]),
], df_out=True)
print(mapper.fit_transform(df))
Mileage
0 26.6
1 19.67
2 18.2
3 20.77
4 15.2
Для sklearn.preprocessing.FunctionTransformer
,
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np
def remove_words(vals):
return np.array([v[0].split(' ')[0] for v in vals])
mapper = DataFrameMapper([
(['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
], df_out=True)
print(mapper.fit_transform(df))
Mileage
0 26.6
1 19.67
2 18.2
3 20.77
4 15.2
Или используйте numpy.vectorize
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np
func = np.vectorize(lambda x: x.split(' ')[0])
def remove_words(vals):
return func(vals)
mapper = DataFrameMapper([
(['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
], df_out=True)
print(mapper.fit_transform(df))
Mileage
0 26.6
1 19.67
2 18.2
3 20.77
4 15.2