Мне все еще не совсем ясно, что вы хотите сделать, но похоже, что вы ищете нечувствительное к регистру совпадение для ряда строк.
Вот способ сделать это с Series.str.contains.
with open(jsonFileName, encoding='utf-8') as jsonFile:
jsonData = json.load(jsonFile)
# convert the series of strings into lower-case
haystack = df[0].str.lower()
for key in jsonData.keys():
# convert the key to lower-case
needle = key.lower()
# create a boolean indexer of any records in the haystack containing the needle
matches = haystack.str.contains(needle)
# create a subset of the dataframe with only those rows
df2 = df[matches]
print(df2)
Вы также можете использовать Series.apply для дополнительной настройки:
matches = haystack.apply(lambda x: needle in x)
Вот полный код с предоставленными примерами данных:
# setup the sample data objects
jsonData = {
"Berlin": "Location A",
"London": "Location B"
}
temp_df = pd.DataFrame([
{0: 'Canberra is the capital of Australia', 1: 'AUS', 2: 1},
{0: 'Berlin is the capital of Germany', 1: 'GER', 2: 1},
{0: 'London is the capital of United Kingdom', 1: 'UK', 2: 1},
{0: 'Berlin is also the art capital of Germany', 1: 'GER', 2: 1},
{0: 'There is a direct flight from berlin to london', 1: 'OTH', 2: 1},
{0: 'Interstate train service are halted', 1: 'OTH', 2: 0}
])
df = (temp_df[temp_df[2] == 1]).reset_index(drop=True)
# convert the series of strings into lower-case
haystack = df[0].str.lower()
for key in jsonData.keys():
# convert the key to lower-case
needle = key.lower()
# create a boolean indexer of any records in the haystack containing the needle
matches = haystack.str.contains(needle)
# create a subset of the dataframe with only those rows
df2 = df[matches]
print(df2)
Выход:
0 1 2
2 London is the capital of United Kingdom UK 1
4 There is a direct flight from berlin to london OTH 1
0 1 2
1 Berlin is the capital of Germany GER 1
3 Berlin is also the art capital of Germany GER 1
4 There is a direct flight from berlin to london OTH 1