'(?<= \$)[\d,]+
![enter image description here](https://i.stack.imgur.com/bGUmc.png)
- Это позволит извлечь совпадение из любого места в строке.
import pandas as pd
# setup dataframe
data = {'Date': ['6/9/2020', '6/9/2020', '6/9/2020'],
'Price': ['1 Per Page IRES MLS : 91 PRICE: $59,900', '1 Per Page IRES MLS : 906 PRICE: $350,000', '1 Per Page IRES MLS : 908 PRICE: $360,000'],
'Description': ['Beautiful Views Sold', 'Fast Seller!', ''],
'Total Concession': ['Total Concession: $2000', 'Total Concession: $5029', 'Total Concession: $9000']}
df = pd.DataFrame(data)
Date Price Description Total Concession
0 6/9/2020 1 Per Page IRES MLS : 91 PRICE: $59,900 Beautiful Views Sold Total Concession: $2000
1 6/9/2020 1 Per Page IRES MLS : 906 PRICE: $350,000 Fast Seller! Total Concession: $5029
2 6/9/2020 1 Per Page IRES MLS : 908 PRICE: $360,000 Total Concession: $9000
# extract numbers from columns
for c in df.columns:
df[f'extracted {c}'] = df[c].str.findall('(?<= \$)[\d,]+').explode().str.replace(',', '')
# columns with no match, like Description, will be all NaN, so drop them
df.dropna(axis=1, inplace=True, how='all')
# output
Date Price Description Total Concession extracted Price extracted Total Concession
0 6/9/2020 1 Per Page $59,798 IRES MLS : 91 PRICE: Beautiful Views Sold Total Concession: $2000 59798 2000
1 6/9/2020 1 Per Page IRES MLS : 906 PRICE: $350,000 Fast Seller! Total Concession: $5029 350000 5029
2 6/9/2020 1 Per Page IRES MLS : 908 PRICE: $360,000 Total Concession: $9000 360000 9000
# drop or rename other columns as needed
Только общая уступка
'(?<=Total Concession: \$)[\d,]+'
![enter image description here](https://i.stack.imgur.com/i047E.png)
- Будет извлечены только числа, перед которыми стоит
'Total Concession: $'
for c in df.columns:
df[f'extracted {c}'] = df[c].str.findall('(?<=Total Concession: \$)[\d,]+').explode().str.replace(',', '')
df.dropna(axis=1, inplace=True, how='all')
# output
Date Price Description Total Concession extracted Total Concession
0 6/9/2020 1 Per Page IRES MLS : 91 PRICE: $59,900 Beautiful Views Sold Total Concession: $2000 2000
1 6/9/2020 1 Per Page IRES MLS : 906 PRICE: $350,000 Fast Seller! Total Concession: $5029 5029
2 6/9/2020 1 Per Page IRES MLS : 908 PRICE: $360,000 Total Concession: $9000 9000
Надежный пример
# setup dataframe
data = {'Date': ['6/9/2020', '6/9/2020', '6/9/2020'],
'Price': ['1 Per Page IRES MLS : 91 PRICE: $59,900', '1 Per Page IRES MLS : 906 PRICE: $350,000', '1 Per Page IRES MLS : 908 PRICE: $360,000'],
'Description': ['Beautiful Views Sold', 'Fast Seller!', ''],
'Total Concession': ['Nothing to see here', 'Total Concession: $5029', 'Total Concession: $9000'],
'Test1': ['A bunch Total Concession: $6,399 of random stuff', 'stuff1', 'stuff2']}
df = pd.DataFrame(data)
Date Price Description Total Concession Test1
0 6/9/2020 1 Per Page IRES MLS : 91 PRICE: $59,900 Beautiful Views Sold Nothing to see here A bunch Total Concession: $6,399 of random stuff
1 6/9/2020 1 Per Page IRES MLS : 906 PRICE: $350,000 Fast Seller! Total Concession: $5029 stuff1
2 6/9/2020 1 Per Page IRES MLS : 908 PRICE: $360,000 Total Concession: $9000 stuff2
for c in df.columns:
df[f'extracted {c}'] = df[c].str.findall('(?<=Total Concession: \$)[\d,]+').explode().str.replace(',', '')
df.dropna(axis=1, inplace=True, how='all')
# list of all extracted columns
extracted_columns = [x for x in df.columns if 'extracted' in x]
# sum all extracted columns
df['all concessions'] = df[extracted_columns].astype(float).sum(axis=1)
# drop the extracted columns
df.drop(columns=extracted_columns, inplace=True)
# print df
Date Price Description Total Concession Test1 all concessions
0 6/9/2020 1 Per Page IRES MLS : 91 PRICE: $59,900 Beautiful Views Sold Nothing to see here A bunch Total Concession: $6,399 of random stuff 6399.0
1 6/9/2020 1 Per Page IRES MLS : 906 PRICE: $350,000 Fast Seller! Total Concession: $5029 stuff1 5029.0
2 6/9/2020 1 Per Page IRES MLS : 908 PRICE: $360,000 Total Concession: $9000 stuff2 9000.0