A perl скрипт типа:
use strict;
use warnings;
*ARGV = *DATA; # for demo only remove this line if you pass the input file as parameter.
my %result;
while (<>) {
my @list = split(/\s+/);
push @{$result{$list[0]}}, $list[4];
}
foreach my $entry (keys %result) {
print "$entry Number ".join (", ",@{$result{$entry}})."\n"
}
__DATA__
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 Pfam PF00512 His Kinase A (phospho-acceptor) domain 402 467 2.2E-18 T 29-06-2014 IPR003661 Signal transduction histidine kinase EnvZ-like, dimerisation/phosphoacceptor domain GO:0000155|GO:0007165|GO:0016020
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 SMART SM01079 114 316 4.1E-23 T 29-06-2014 IPR006189 CHASE
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 Pfam PF03924 CHASE domain 115 314 1.0E-40 T 29-06-2014 IPR006189 CHASE
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 PRINTS PR00344 Bacterial sensor protein C-terminal signature 602 616 9.2E-11 T 29-06-2014 IPR004358 Signal transduction histidine kinase-related protein, C-terminal GO:0016310|GO:0016772
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 PRINTS PR00344 Bacterial sensor protein C-terminal signature 637 655 9.2E-11 T 29-06-2014 IPR004358 Signal transduction histidine kinase-related protein, C-terminal GO:0016310|GO:0016772
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 PRINTS PR00344 Bacterial sensor protein C-terminal signature 620 630 9.2E-11 T 29-06-2014 IPR004358 Signal transduction histidine kinase-related protein, C-terminal GO:0016310|GO:0016772
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 ProSiteProfiles PS50110 Response regulatory domain profile. 853 990 28.209 T 29-06-2014 IPR001789 Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 SMART SM00448 cheY-homologous receiver domain 852 986 2.9E-29 T 29-06-2014 IPR001789 Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 Pfam PF00072 Response regulator receiver domain 854 986 8.5E-21 T 29-06-2014 IPR001789 Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
распечатывает:
CA11g10610 Number PF00512, SM01079, PF03924, PR00344, PR00344, PR00344, PS50110, SM00448, PF00072