объединить два CSV-файла, используя Python или панд - PullRequest
0 голосов
/ 18 сентября 2018
**csv file 1**

date    yearMonth   deviceCategory  channelGrouping eventCategory   Totalevents
20160719    201607  desktop Direct  _GW_Legal_RM_false  149
20160719    201607  desktop Direct  _GW_Risk_RM_false   298
20160719    201607  desktop Direct  _GW_Risk_RM_true    149
20160719    201607  desktop Direct  _GW__Product-Sign-In__  895
20160719    201607  desktop Organic Search  _GW_Legal_RM_false  149
20160719    201607  desktop Organic Search  _GW_Risk_RM_false   746
20160719    201607  desktop Organic Search  _GW__Product-Sign-In__  1342
20160719    201607  desktop Referral    _GW__Product-Sign-In__  1044
20160719    201607  mobile  Direct  _GW_Legal_RM_false  149
20160719    201607  mobile  Social  _GW_Legal_RM_false  149
20160719    201607  tablet  Direct  _GW_Legal_RM_false  149
20160720    201607  desktop Branded Paid Search _GW_Legal_RM_false  149
20160720    201607  desktop Direct  _GW_Legal_RM_false  149
20160720    201607  desktop Direct  _GW__Product-Sign-In__  746
20160720    201607  desktop Non-Branded Paid Search _GW_Legal_RM_false  149
20160720    201607  desktop Non-Branded Paid Search _GW_Risk_RM_false   149
20160720    201607  desktop Organic Search  _GW_Legal_RM_false  1939
20160720    201607  desktop Organic Search  _GW_Risk_RM_false   298

У меня есть 2 файла CSV, я хочу объединить на основе одного общего столбца, но общие длины столбцов различны! Есть ли способ объединить / объединить это без дублирования значений

CSV-файл 2

eventCategory   event_type
_GW_Legal_RM_false  Legal
_GW_Legal_RM_true   Legal
_GW_Legal_RM_   Legal
_GW_Risk_RM_false   Risk
_GW_Risk_RM_true    Risk
_GW_Risk_RM_    Risk
_GW__Product-Sign-In__  Sign-in

Output.csv

eventCategory   event_type  date    yearMonth   deviceCategory  channelGrouping Totalevents
 _GW_Legal_RM_false Legal   20160719    201607  desktop Direct  149
 _GW_Legal_RM_false Legal   20160719    201607  desktop Organic Search  149
 _GW_Legal_RM_false Legal   20160719    201607  mobile  Direct  149
 _GW_Legal_RM_false Legal   20160719    201607  mobile  Social  149

Ответы [ 3 ]

0 голосов
/ 18 сентября 2018

Использование map с set_index:

import pandas as pd
from io import StringIO

csv1 = StringIO("""date    yearMonth   deviceCategory  channelGrouping  eventCategory   Totalevents
20160719    201607  desktop  Direct  _GW_Legal_RM_false  149
20160719    201607  desktop  Direct  _GW_Risk_RM_false   298
20160719    201607  desktop  Direct  _GW_Risk_RM_true    149
20160719    201607  desktop  Direct  _GW__Product-Sign-In__  895
20160719    201607  desktop  Organic Search  _GW_Legal_RM_false  149
20160719    201607  desktop  Organic Search  _GW_Risk_RM_false   746
20160719    201607  desktop  Organic Search  _GW__Product-Sign-In__  1342
20160719    201607  desktop  Referral    _GW__Product-Sign-In__  1044
20160719    201607  mobile  Direct  _GW_Legal_RM_false  149
20160719    201607  mobile  Social  _GW_Legal_RM_false  149
20160719    201607  tablet  Direct  _GW_Legal_RM_false  149
20160720    201607  desktop  Branded Paid Search  _GW_Legal_RM_false  149
20160720    201607  desktop  Direct  _GW_Legal_RM_false  149
20160720    201607  desktop  Direct  _GW__Product-Sign-In__  746
20160720    201607  desktop  Non-Branded Paid Search  _GW_Legal_RM_false  149
20160720    201607  desktop  Non-Branded Paid Search  _GW_Risk_RM_false   149
20160720    201607  desktop  Organic Search  _GW_Legal_RM_false  1939
20160720    201607  desktop  Organic Search  _GW_Risk_RM_false   298""")

csv2= StringIO("""eventCategory   event_type
_GW_Legal_RM_false  Legal
_GW_Legal_RM_true   Legal
_GW_Legal_RM_   Legal
_GW_Risk_RM_false   Risk
_GW_Risk_RM_true    Risk
_GW_Risk_RM_    Risk
_GW__Product-Sign-In__  Sign-in""")

df1 = pd.read_csv(csv1,sep='\s\s+')
df2 = pd.read_csv(csv2, sep='\s\s+')

df1['event_type'] = df1['eventCategory'].map(df2.set_index('eventCategory')['event_type'])

df1

Вывод:

        date  yearMonth deviceCategory          channelGrouping           eventCategory  Totalevents event_type
0   20160719     201607        desktop                   Direct      _GW_Legal_RM_false          149      Legal
1   20160719     201607        desktop                   Direct       _GW_Risk_RM_false          298       Risk
2   20160719     201607        desktop                   Direct        _GW_Risk_RM_true          149       Risk
3   20160719     201607        desktop                   Direct  _GW__Product-Sign-In__          895    Sign-in
4   20160719     201607        desktop           Organic Search      _GW_Legal_RM_false          149      Legal
5   20160719     201607        desktop           Organic Search       _GW_Risk_RM_false          746       Risk
6   20160719     201607        desktop           Organic Search  _GW__Product-Sign-In__         1342    Sign-in
7   20160719     201607        desktop                 Referral  _GW__Product-Sign-In__         1044    Sign-in
8   20160719     201607         mobile                   Direct      _GW_Legal_RM_false          149      Legal
9   20160719     201607         mobile                   Social      _GW_Legal_RM_false          149      Legal
10  20160719     201607         tablet                   Direct      _GW_Legal_RM_false          149      Legal
11  20160720     201607        desktop      Branded Paid Search      _GW_Legal_RM_false          149      Legal
12  20160720     201607        desktop                   Direct      _GW_Legal_RM_false          149      Legal
13  20160720     201607        desktop                   Direct  _GW__Product-Sign-In__          746    Sign-in
14  20160720     201607        desktop  Non-Branded Paid Search      _GW_Legal_RM_false          149      Legal
15  20160720     201607        desktop  Non-Branded Paid Search       _GW_Risk_RM_false          149       Risk
16  20160720     201607        desktop           Organic Search      _GW_Legal_RM_false         1939      Legal
17  20160720     201607        desktop           Organic Search       _GW_Risk_RM_false          298       Risk
0 голосов
/ 19 сентября 2018
df1 = pd.read_csv("csv1.csv")

df2 = pd.read_csv("csv2.csv")

df = pd.merge(df1, df2, on='eventCategory', how='left')

некоторая модификация ответа @FrankZhu.

0 голосов
/ 18 сентября 2018

Чтобы расширить ответ ALollz,

import pandas as pd
df1 = pd.read_csv("1.csv", sep=" ")
df2 = pd.read_csv("2.csv", sep=" ")

df = pd.merge([df1, df2], on='eventCategory', how='left')
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...