У меня есть список кортежей:
Что я хочу сделать, это сохранить только entities
, которые имеют уникальные номера внутри каждого кортежа.
dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
Отсюда ожидаемый результат:
[('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
Код до сих пор:
dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
seen_values = []
clean_data = []
# loop through each sentence and dict of values
for sentence, values in dataset:
for value in values['entities']:
if value[0] in seen_values:
# remove if we have seen this before
values['entities'].remove(value)
else:
# add to list if we have not seen this before
seen_values.append(value[0])
clean_data.append((sentence, values))
print(clean_data)
, который дает [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
Может кто-нибудь, помогите мне с этим