MCVE
import pandas as pd
from io import StringIO
textfile = StringIO("""
brand
0 ARCHIMEDE PILOT
1 Seiko SRP637
2 Sinn 103
3 Orient Mako
4 Eterna Kontiki
5 Seiko SKX007
6 Boldr Odyssey
7 Bvlgari Octo
8 Aegir
9 Audemars Piguet Royal Oak Offshore""")
df = pd.read_csv(textfile, sep='\s\s+', engine='python')
print("Input dataframe...\n")
print(df.to_markdown())
listcomp = ['PILOT', 'SRP637', '103', 'Mako', 'Kontiki', 'SKX007', 'Odyssey','Octo', 'Royal Oak Offshore']
regex = f'{"|".join(listcomp)}'
df['model'] = df['brand'].str.extract(f'(?P<model>{regex})')
df['brand'] = df['brand'].str.replace(regex,'')
print("Output dataframe...\n")
print(df.to_markdown())
Выходы:
Input dataframe...
| | brand |
|---:|:-----------------------------------|
| 0 | ARCHIMEDE PILOT |
| 1 | Seiko SRP637 |
| 2 | Sinn 103 |
| 3 | Orient Mako |
| 4 | Eterna Kontiki |
| 5 | Seiko SKX007 |
| 6 | Boldr Odyssey |
| 7 | Bvlgari Octo |
| 8 | Aegir |
| 9 | Audemars Piguet Royal Oak Offshore |
Output dataframe...
| | brand | model |
|---:|:----------------|:-------------------|
| 0 | ARCHIMEDE | PILOT |
| 1 | Seiko | SRP637 |
| 2 | Sinn | 103 |
| 3 | Orient | Mako |
| 4 | Eterna | Kontiki |
| 5 | Seiko | SKX007 |
| 6 | Boldr | Odyssey |
| 7 | Bvlgari | Octo |
| 8 | Aegir | nan |
| 9 | Audemars Piguet | Royal Oak Offshore |
Опция 1:
Используйте pandas, чтобы сначала разделить пространство, используя .str.split
, затем используйте where
и isin
:
listcomp = ['PILOT', 'SRP637', '103', 'Mako', 'Kontiki', 'SKX007', 'Odyssey','Octo']
df_out = df['brand'].str.split(' ', expand=True).set_axis(['brand', 'model'], axis=1, inplace=False)
df_out['model'] = df_out['model'].where(df_out['model'].isin(listcomp))
df_out
Вывод:
| | brand | model |
|---:|:----------|:--------|
| 0 | ARCHIMEDE | PILOT |
| 1 | Seiko | SRP637 |
| 2 | Sinn | 103 |
| 3 | Orient | Mako |
| 4 | Eterna | Kontiki |
| 5 | Seiko | SKX007 |
| 6 | Boldr | Odyssey |
| 7 | Bvlgari | Octo |
| 8 | Aegir | nan |
Опция 2
Использование .str.extract с именованными группами
listcomp = ['PILOT', 'SRP637', '103', 'Mako', 'Kontiki', 'SKX007', 'Odyssey','Octo']
regex = f'{"|".join(listcomp)}'
df['brand'].str.extract(f'(?P<brand>\w+)\s?(?P<model>{regex})?')
Вывод:
| | brand | model |
|---:|:----------|:--------|
| 0 | ARCHIMEDE | PILOT |
| 1 | Seiko | SRP637 |
| 2 | Sinn | 103 |
| 3 | Orient | Mako |
| 4 | Eterna | Kontiki |
| 5 | Seiko | SKX007 |
| 6 | Boldr | Odyssey |
| 7 | Bvlgari | Octo |
| 8 | Aegir | nan |
Вариант 3 (обновлен с измененным вопросом и данными)
listcomp = ['PILOT', 'SRP637', '103', 'Mako', 'Kontiki', 'SKX007', 'Odyssey','Octo', 'Royal Oak Offshore']
regex = f'{"|".join(listcomp)}'
df['model'] = df['brand'].str.extract(f'(?P<model>{regex})')
df['brand'] = df['brand'].str.replace(regex,'')
df
Вывод:
| | brand | model |
|---:|:----------------|:-------------------|
| 0 | ARCHIMEDE | PILOT |
| 1 | Seiko | SRP637 |
| 2 | Sinn | 103 |
| 3 | Orient | Mako |
| 4 | Eterna | Kontiki |
| 5 | Seiko | SKX007 |
| 6 | Boldr | Odyssey |
| 7 | Bvlgari | Octo |
| 8 | Aegir | nan |
| 9 | Audemars Piguet | Royal Oak Offshore |