Попробуйте:
import random
from itertools import chain
import numpy as np
# groups by type 'H' and 'L', makes list, and assigns to corresponding variables
H_, L_ = df.groupby('Type')['Sentence'].agg(list)
# initialize empty dict (actually list, I did it for better visualization)
dict_of_list = {}
# loop 10 times for creating each list
for j in range(10):
random_idx = random.sample(range(96), 48)
H_idx = np.isin(range(96), random_idx)
L_idx = ~H_idx
H, L = np.array(H_)[H_idx].tolist(), np.array(L_)[L_idx].tolist()
H, L = random.sample(H, len(H)), random.sample(L,len(L))
# then zip together and chain them, to make a sequence of [H, L, H, L, ...]
pair_wise_list = list(chain(*zip(H,L)))
# zip(*[iter(pair_wise_list)]*16) divides the entire list in sublist of 16
# more about zip(*[iter(pair_wise_list)]*16) in reference below
# random.sample(list(i),len(i)) adds more randomness in positions of H,L in sublist
lst = [random.sample(list(i),len(i)) for i in zip(*[iter(pair_wise_list)]*16)]
dict_of_list[j] = lst
Я скопировал 10 строк. Составлен 10 списков, каждый из которых содержит 5 подсписков, в которых содержится 2 предложения от каждого типа. По всему списку из 10 каждое предложение повторяется только один раз и сбалансировано.
>>> df
SetNum Sentence Type Index
0 1 I went to work H 0
1 1 She went to work L 1
2 2 I drink coffee H 2
3 2 She drinks coffee L 3
4 3 The desk is red H 4
5 3 The desk is white L 5
6 4 The TV is big H 6
7 4 The TV is white L 7
8 5 This is a car H 8
9 5 This is a plane L 9
>>> import random
>>> from itertools import chain
>>> H, L = df.groupby('Type')['Sentence'].agg(list)
>>> dict_of_list = {}
>>> for j in range(10):
... H, L = random.sample(H, len(H)), random.sample(L, len(L))
... pair_wise_list = list(chain(*zip(H,L)))
... lst = [random.sample(list(i),len(i)) for i in zip(*[iter(pair_wise_list)]*2)] # had to change to 2
... dict_of_list[j] = lst
>>> dict_of_list
{0: [['The desk is white', 'I drink coffee'],
['This is a car', 'The TV is white'],
['She went to work', 'The TV is big'],
['I went to work', 'She drinks coffee'],
['This is a plane', 'The desk is red']],
1: [['She went to work', 'The TV is big'],
['I went to work', 'The desk is white'],
['This is a car', 'This is a plane'],
['The TV is white', 'The desk is red'],
['I drink coffee', 'She drinks coffee']],
2: [['The desk is red', 'The TV is white'],
['The TV is big', 'She drinks coffee'],
['I went to work', 'This is a plane'],
['She went to work', 'This is a car'],
['The desk is white', 'I drink coffee']],
3: [['The desk is red', 'The TV is white'],
['I drink coffee', 'She drinks coffee'],
['She went to work', 'I went to work'],
['This is a car', 'This is a plane'],
['The desk is white', 'The TV is big']],
4: [['I went to work', 'This is a plane'],
['The desk is red', 'She drinks coffee'],
['The TV is white', 'This is a car'],
['The TV is big', 'The desk is white'],
['She went to work', 'I drink coffee']],
5: [['She drinks coffee', 'This is a car'],
['She went to work', 'I went to work'],
['The desk is white', 'The TV is big'],
['I drink coffee', 'The TV is white'],
['The desk is red', 'This is a plane']],
6: [['This is a plane', 'The TV is big'],
['She drinks coffee', 'I went to work'],
['She went to work', 'This is a car'],
['I drink coffee', 'The TV is white'],
['The desk is white', 'The desk is red']],
7: [['The desk is red', 'She drinks coffee'],
['This is a car', 'The TV is white'],
['The TV is big', 'She went to work'],
['I went to work', 'The desk is white'],
['This is a plane', 'I drink coffee']],
8: [['I went to work', 'She went to work'],
['I drink coffee', 'The desk is white'],
['The TV is big', 'The TV is white'],
['The desk is red', 'She drinks coffee'],
['This is a plane', 'This is a car']],
9: [['She went to work', 'The TV is big'],
['I went to work', 'The TV is white'],
['I drink coffee', 'The desk is white'],
['This is a plane', 'This is a car'],
['She drinks coffee', 'The desk is red']]}
РЕДАКТИРОВАТЬ : Чтобы получить все содержимое, измените первые несколько строк на следующее:
import random
from itertools import chain
# groups by type 'H' and 'L', makes list, and assigns to corresponding variables
df2 = df.copy()
df2['joined'] = df.astype(str).agg(', '.join,1)
H, L = df2.groupby('Type')['joined'].agg(list)
Ссылка:
Как работает zip (* [iter]] * n) в Python?