Почему бы не создать хеш ключей с подсчетом вхождений и использовать это:
my %counts;
foreach my $rowref (@all_matches)
{
$counts{lc($rowref->[4])}++;
}
@all_matches = sort { $counts{lc($b->[4])} <=> $counts{lc($a->[4])} ||
lc($a->[4]) cmp lc($b->[4])
} @all_matches;
Испытано ...
#!/usr/bin/env perl
use strict;
use warnings;
my @all_matches = (
["chpt10_2", "sent. 2", "alice", "nsubj", "animals", "protect"],
["chpt12_1", "sent. 54", "bob", "nsubj", "cells", "protect"],
["chpt25_4", "sent. 47", "carol", "nsubj", "plants", "protect"],
["chpt34_1", "sent. 1", "dave", "nsubj", "cells", "protect"],
["chpt35_1", "sent. 2", "eli", "nsubj", "cells", "protect"],
["chpt38_1", "sent. 1", "fred", "nsubj", "animals", "protect"],
["chpt54_1", "sent. 1", "greg", "nsubj", "uticle", "protect"]
);
my %counts;
foreach my $rowref (@all_matches)
{
$counts{lc($rowref->[4])}++;
}
@all_matches = sort { $counts{lc($b->[4])} <=> $counts{lc($a->[4])} ||
lc($a->[4]) cmp lc($b->[4])
} @all_matches;
my $i = 0;
foreach my $rowref (@all_matches)
{
$i++;
print "$i";
print " $_" foreach (@$rowref);
print "\n";
}
Выход:
1 chpt12_1 sent. 54 bob nsubj cells protect
2 chpt34_1 sent. 1 dave nsubj cells protect
3 chpt35_1 sent. 2 eli nsubj cells protect
4 chpt10_2 sent. 2 alice nsubj animals protect
5 chpt38_1 sent. 1 fred nsubj animals protect
6 chpt25_4 sent. 47 carol nsubj plants protect
7 chpt54_1 sent. 1 greg nsubj uticle protect
Как отмечается в комментарии, учитывая показанные данные, операции lc
не нужны - и их удаление повысит производительность, как если бы к каждому массиву добавлялся ключ с преобразованием регистра.
И с lc
, используемым один раз в строке - обратите внимание на значения данных, которые были сброшены:
#!/usr/bin/env perl
use strict;
use warnings;
my @all_matches = (
[ "chpt10_2", "sent. 2", "alice", "nsubj", "animAls", "protect" ],
[ "chpt12_1", "sent. 54", "bob", "nsubj", "celLs", "protect" ],
[ "chpt25_4", "sent. 47", "carol", "nsubj", "plAnts", "protect" ],
[ "chpt34_1", "sent. 1", "dave", "nsubj", "cElls", "protect" ],
[ "chpt35_1", "sent. 2", "eli", "nsubj", "cells", "protect" ],
[ "chpt38_1", "sent. 1", "fred", "nsubj", "Animals", "protect" ],
[ "chpt54_1", "sent. 1", "greg", "nsubj", "uticle", "protect" ],
);
my %counts;
foreach my $rowref (@all_matches)
{
push @$rowref, lc($rowref->[4]);
$counts{$rowref->[6]}++;
}
@all_matches = sort { $counts{$b->[6]} <=> $counts{$a->[6]} || $a->[6] cmp $b->[6]
} @all_matches;
my $i = 0;
foreach my $rowref (@all_matches)
{
$i++;
print "$i";
printf " %-9s", $_ foreach (@$rowref);
print "\n";
}
Выход:
1 chpt12_1 sent. 54 bob nsubj celLs protect cells
2 chpt34_1 sent. 1 dave nsubj cElls protect cells
3 chpt35_1 sent. 2 eli nsubj cells protect cells
4 chpt10_2 sent. 2 alice nsubj animAls protect animals
5 chpt38_1 sent. 1 fred nsubj Animals protect animals
6 chpt25_4 sent. 47 carol nsubj plAnts protect plants
7 chpt54_1 sent. 1 greg nsubj uticle protect uticle