geneHomology
============
id genome_name gene_id homolog_genome_name homolog_gene_id consider_homolog
1 HomoSap 1007 MusMus 824 1
2 HomoSap 1007 MusMus 825 1
3 HomoSap 1007 MusMus 826 1
4 HomoSap 2890 EColi 2140 1
...
gene
====
genome_name gene_id gene_category
MusMus 823 Upregulated
MusMus 824 Downregulated
MusMus 825 Normal
MusMus 826 Normal
MusMus 827 Upregulated
EColi 2140 Normal
...
consider_homolog
- это перечисление (0,1).genome_name
и gene_id
являются первичными ключами в gene
.geneHomology
очень большой - около 200 миллионов строк.
Моя цель - подсчитать для каждого гена в genes
, сколько гомологов у него в каждом gene_category
.
Например, следуя приведенным выше данным, HomoSap 1007
имеет 3 Normal
гомологов и 1 Downregulated
.
Итак, мой запрос:
SELECT a.id,a.genome_name,a.gene_id,a.homolog_genome_name,a.homolog_gene_id,COUNT(b.gene_category)
FROM geneHomology a,gene b
WHERE a.consider_homolog='1' AND a.homolog_genome_name=b.genome_name AND a.homolog_gene_id=b.gene_id
GROUP BY a.genome_name,a.gene_id,b.gene_category;
Itникогда не возвращается (и я терпеливо ждал более часа).
Я уже проиндексировал gene_category
в gene
.
Я действительно новичок в MySQL, но у меня естьroot-доступ к БД, чтобы я мог следовать вашим предложениям (осторожно ...).Я был бы рад предоставить любую дополнительную информацию.
ОБНОВЛЕНИЕ Это вывод EXPLAIN
для запроса:
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
| 1 | SIMPLE | b | ALL | PRIMARY,gene_genome | NULL | NULL | NULL | 1560695 | Using temporary; Using filesort |
| 1 | SIMPLE | a | ref | geneHomologyHit_gene | geneHomologyHit_gene | 54 | my_db_v71.b.gene_id,my_db_v71.b.genome_name | 13 | Using where |
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
ОБНОВЛЕНИЕ 2
mysql> SHOW INDEX FROM gene;
+-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| gene | 0 | PRIMARY | 1 | gene_id | A | NULL | NULL | NULL | | BTREE | |
| gene | 0 | PRIMARY | 2 | genome_name | A | 1560695 | NULL | NULL | | BTREE | |
| gene | 1 | gene_organism | 1 | taxon_id | A | 392 | NULL | NULL | | BTREE | |
| gene | 1 | gene_genome | 1 | genome_name | A | 853 | NULL | NULL | | BTREE | |
| gene | 1 | gene_gene_category | 1 | gene_category | A | 5 | NULL | NULL | | BTREE | |
+-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.01 sec)
ОБНОВЛЕНИЕ 3
mysql> SHOW INDEX FROM geneHomology;
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| geneHomology | 0 | PRIMARY | 1 | id | A | 680326661 | NULL | NULL | | BTREE | |
| geneHomology | 1 | geneHomologyQuery_gene | 1 | gene_id | A | 1498516 | NULL | NULL | | BTREE | |
| geneHomology | 1 | geneHomologyQuery_gene | 2 | genome_name | A | 1505147 | NULL | NULL | | BTREE | |
| geneHomology | 1 | geneHomologyHit_gene | 1 | homolog_gene_id | A | 52332820 | NULL | NULL | | BTREE | |
| geneHomology | 1 | geneHomologyHit_gene | 2 | homolog_genome_name | A | 52332820 | NULL | NULL | | BTREE | |
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)
ОБНОВЛЕНИЕ 4 Есть ли способ получить только частичные результаты, чтобы увидеть, что яполучать то, что я хочу?Я пытался LIMIT 1000
и даже LIMIT 10
, но это ничего не меняет.
ОБНОВЛЕНИЕ 5
mysql> SHOW CREATE TABLE geneHomology;
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| geneHomology | CREATE TABLE `geneHomology` (
`id` bigint(20) NOT NULL auto_increment,
`genome_name` varchar(20) NOT NULL,
`gene_id` varchar(30) NOT NULL,
`homolog_genome_name` varchar(20) NOT NULL,
`homolog_gene_id` varchar(30) NOT NULL,
`homolog_length` bigint(20) unsigned NOT NULL,
`significance` double unsigned NOT NULL,
`bit_score` double unsigned NOT NULL,
`percent_identity` double unsigned NOT NULL,
`start_match` int(10) unsigned NOT NULL,
`end_match` int(10) unsigned NOT NULL,
`start_match_percent` double unsigned NOT NULL,
`end_match_percent` double unsigned NOT NULL,
`strand` enum('+','-') default NULL,
`homolog_start_match` int(10) unsigned NOT NULL,
`homolog_end_match` int(10) unsigned NOT NULL,
`homolog_start_match_percent` double unsigned NOT NULL,
`homolog_end_match_percent` double unsigned NOT NULL,
`homolog_strand` enum('+','-') default NULL,
`consider_gene_homology` enum('0','1') NOT NULL,
`reason_not_considered` varchar(50) default NULL,
`num_hsps` int(10) unsigned NOT NULL,
`homology_type` varchar(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `geneHomologygene` (`gene_id`,`genome_name`),
KEY `geneHomologyhomolog_gene` (`homolog_gene_id`,`homolog_genome_name`)
) ENGINE=MyISAM AUTO_INCREMENT=680326662 DEFAULT CHARSET=latin1 |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE gene;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gene | CREATE TABLE `gene` (
`taxon_id` int(10) unsigned NOT NULL,
`genome_name` varchar(20) NOT NULL,
`gene_id` varchar(30) NOT NULL,
`symbol` varchar(30) default NULL,
`type` varchar(30) default NULL,
`product` varchar(300) default NULL,
`strand` enum('+','-') NOT NULL,
`start` bigint(20) unsigned NOT NULL,
`end` bigint(20) unsigned NOT NULL,
`gene_category` enum('Upregulated','Downregulated','Normal','n/a') NOT NULL,
`consider_gene` enum('0','1') NOT NULL,
`reason_not_considered` varchar(50) default NULL,
`sequence` longblob NOT NULL,
`additional_info` varchar(300) default NULL,
PRIMARY KEY (`gene_id`,`genome_name`),
KEY `gene_organism` (`taxon_id`),
KEY `gene_genome` (`genome_name`),
KEY `gene_gene_category` (`gene_category`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)