У меня есть простое решение, которое может быть недостаточно элегантным и быстрым.В вашем примере вы можете сначала найти, где происходит возврат, затем найти, кто возместил, и, наконец, удалить эти строки.Код может выглядеть следующим образом:
delete_refund=function(transaction_matrix){
#find in which row refund happens
index_refund=which(transaction_matrix[ , "Gross"]<0);
#find who receive refund
refunded=transaction_matrix[index_refund, "Receiver_email"];
#for each one refunds, find what they purchase before refund
all_refund_purchase=vector();
for (row in index_refund) {
one_purchase=which((transaction_matrix[1:row,"Gross"]==
abs(transaction_matrix[row,"Gross"])) &
(transaction_matrix[1:row,"Sender_email"]==
transaction_matrix[row,"Receiver_email"]));
#one may buy several things at the same value and refund part of them, so length of one_purchase may be greater than 1
one_purchase=one_purchase[!(one_purchase %in% all_refund_purchase)];
#one may has many refunds, record those which haven't been captured in all_refund_purchase
all_refund_purchase=c(all_refund_purchase,
one_purchase[length(one_purchase)])
#when some one bought several things at the same value
}
return(transaction_matrix[c(-index_refund, -all_refund_purchase), ]);
}
Из-за отсутствия образца данных я протестировал его на простом примере, который я создал.
df=data.frame(date=1:4, Gross=c(30,30,-30,30),
Sender_email=c('bbb@customer.com','ccc@customer.com',
'admin@site.com','bbb@customer.com'),
Receiver_email=c('admin@site.com','admin@site.com',
'bbb@customer.com','admin@site.com'),
stringsAsFactors = FALSE);
date Gross Sender_email Receiver_email
1 1 30 bbb@customer.com admin@site.com
2 2 30 ccc@customer.com admin@site.com
3 3 -30 admin@site.com bbb@customer.com
4 4 30 bbb@customer.com admin@site.com
Результат -
date Gross Sender_email Receiver_email
2 2 30 ccc@customer.com admin@site.com
4 4 30 bbb@customer.com admin@site.com
Что удовлетворяет потребностям автора.