PostgreSQL полнотекстовый поиск очень медленный - PullRequest
0 голосов
/ 13 июля 2020

Мои результаты, возвращаемые при PostgreSQL полнотекстовом поиске, очень медленные, обычно 5-25 секунд для запроса.

Мой запрос выглядит следующим образом:

SELECT tp . * ОТ diff_profile tp ПРИСОЕДИНЯЙТЕСЬ к diff_profile_full_text_document AS full_text_document ON tp.id = full_text_document.profile_id ГДЕ to_tsquery ('социальные сети') @@ (full_text_document.rankings) И (ts_rank_cd (full_text_document.rankings, to_tsquery ('социальные сети'))> 0) ЗАКАЗАТЬ ts_rank_cd (full_text_document.rankings, to_tsquery ('социальные сети')) DES C ПРЕДЕЛ 1000

EXPLAIN ANALYZE:

"Limit  (cost=149011.43..149128.11 rows=1000 width=458) (actual time=25698.214..25699.022 rows=1000 loops=1)"
"  Buffers: shared hit=2016881 read=632773"
"  ->  Gather Merge  (cost=149011.43..149960.47 rows=8134 width=458) (actual time=25698.210..25759.028 rows=1000 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        Buffers: shared hit=5903977 read=1845055"
"        ->  Sort  (cost=148011.41..148021.58 rows=4067 width=458) (actual time=25688.667..25689.008 rows=479 loops=3)"
"              Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
"              Sort Method: top-N heapsort  Memory: 568kB"
"              Worker 0:  Sort Method: top-N heapsort  Memory: 572kB"
"              Worker 1:  Sort Method: top-N heapsort  Memory: 553kB"
"              Buffers: shared hit=5903977 read=1845055"
"              ->  Nested Loop  (cost=322.87..147788.42 rows=4067 width=458) (actual time=331.940..25394.498 rows=202972 loops=3)"
"                    Buffers: shared hit=5903961 read=1845055"
"                    ->  Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document  (cost=322.31..112574.79 rows=4067 width=340) (actual time=331.745..19330.030 rows=202972 loops=3)"
"                          Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                          Rows Removed by Index Recheck: 3043633"
"                          Filter: (ts_rank_cd(rankings, to_tsquery('social & media'::text)) > '0'::double precision)"
"                          Heap Blocks: exact=19968 lossy=159568"
"                          Buffers: shared hit=2583041 read=1086064"
"                          ->  Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index  (cost=0.00..319.87 rows=29282 width=0) (actual time=324.090..324.090 rows=608915 loops=1)"
"                                Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                                Buffers: shared hit=293 read=541"
"                    ->  Index Scan using output_profile_pkey2 on diff_profile tp  (cost=0.56..8.41 rows=1 width=454) (actual time=0.015..0.015 rows=1 loops=608915)"
"                          Index Cond: (id = full_text_document.profile_id)"
"                          Buffers: shared hit=2287324 read=758991"
"Planning Time: 1.055 ms"
"Execution Time: 25759.553 ms"

Обновлено с новым запросом и увеличенным work_mem

"Nested Loop  (cost=114825.29..123517.90 rows=1000 width=454) (actual time=8006.457..8020.363 rows=1000 loops=1)"
"  Buffers: shared hit=617083 read=337225"
"  ->  Limit  (cost=114824.72..114941.40 rows=1000 width=12) (actual time=8006.399..8006.852 rows=1000 loops=1)"
"        Buffers: shared hit=614368 read=334936"
"        ->  Gather Merge  (cost=114824.72..115773.76 rows=8134 width=12) (actual time=8006.397..8031.989 rows=1000 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              Buffers: shared hit=1683284 read=909769"
"              ->  Sort  (cost=113824.70..113834.87 rows=4067 width=12) (actual time=7997.708..7997.805 rows=796 loops=3)"
"                    Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
"                    Sort Method: top-N heapsort  Memory: 123kB"
"                    Worker 0:  Sort Method: top-N heapsort  Memory: 123kB"
"                    Worker 1:  Sort Method: top-N heapsort  Memory: 123kB"
"                    Buffers: shared hit=1683284 read=909769"
"                    ->  Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document  (cost=322.31..113601.71 rows=4067 width=12) (actual time=475.461..7884.962 rows=202972 loops=3)"
"                          Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                          Filter: (ts_rank_cd(rankings, to_tsquery('social & media'::text)) > '0'::double precision)"
"                          Heap Blocks: exact=191318"
"                          Buffers: shared hit=1683272 read=909765"
"                          ->  Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index  (cost=0.00..319.87 rows=29282 width=0) (actual time=339.534..339.534 rows=608915 loops=1)"
"                                Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                                Buffers: shared hit=293 read=541"
"  ->  Index Scan using output_profile_pkey2 on diff_profile tp  (cost=0.56..8.57 rows=1 width=454) (actual time=0.013..0.013 rows=1 loops=1000)"
"        Index Cond: (id = full_text_document.profile_id)"
"        Buffers: shared hit=2715 read=2289"
"Planning Time: 1.094 ms"
"Execution Time: 8045.787 ms"

Обновлено с таймингами ввода-вывода (64M work_mem и новый запрос)

"Nested Loop  (cost=120482.74..129175.35 rows=1000 width=454) (actual time=10603.522..10619.940 rows=1000 loops=1)"
"  Buffers: shared hit=857647 read=306515"
"  I/O Timings: read=2124.818"
"  ->  Limit  (cost=120482.18..120598.85 rows=1000 width=16) (actual time=10603.405..10603.921 rows=1000 loops=1)"
"        Buffers: shared hit=854932 read=304227"
"        I/O Timings: read=2113.927"
"        ->  Gather Merge  (cost=120482.18..122063.82 rows=13556 width=16) (actual time=10603.400..10705.650 rows=1000 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              Buffers: shared hit=2563341 read=909745"
"              I/O Timings: read=6413.484"
"              ->  Sort  (cost=119482.15..119499.10 rows=6778 width=16) (actual time=10595.426..10595.479 rows=786 loops=3)"
"                    Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social <-> media'::text))) DESC, (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
"                    Sort Method: top-N heapsort  Memory: 122kB"
"                    Worker 0:  Sort Method: top-N heapsort  Memory: 123kB"
"                    Worker 1:  Sort Method: top-N heapsort  Memory: 122kB"
"                    Buffers: shared hit=2563341 read=909745"
"                    I/O Timings: read=6413.484"
"                    ->  Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document  (cost=323.93..119110.52 rows=6778 width=16) (actual time=463.799..10479.477 rows=188245 loops=3)"
"                          Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                          Filter: ((ts_rank_cd(rankings, to_tsquery('social <-> media'::text)) > '0'::double precision) OR (ts_rank_cd(rankings, to_tsquery('social & media'::text)) >= '0.2'::double precision))"
"                          Rows Removed by Filter: 14727"
"                          Heap Blocks: exact=175093"
"                          Buffers: shared hit=2563325 read=909745"
"                          I/O Timings: read=6413.484"
"                          ->  Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index  (cost=0.00..319.87 rows=29282 width=0) (actual time=326.170..326.170 rows=608915 loops=1)"
"                                Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                                Buffers: shared hit=293 read=541"
"                                I/O Timings: read=6.623"
"  ->  Index Scan using output_profile_pkey2 on diff_profile tp  (cost=0.56..8.57 rows=1 width=454) (actual time=0.015..0.015 rows=1 loops=1000)"
"        Index Cond: (id = full_text_document.profile_id)"
"        Buffers: shared hit=2715 read=2288"
"        I/O Timings: read=10.891"
"Planning Time: 7.341 ms"
"Execution Time: 10722.398 ms"

Буду признателен за любую помощь в этом вопросе. Заранее спасибо.

1 Ответ

0 голосов
/ 20 июля 2020

Похоже, что каждая строка в "full_text_document" разрешается ровно в одну строку в "tp". Если известно, что это всегда так, то, поскольку WHERE и ORDER BY вообще не зависят от "tp", вы можете сначала выполнить LIMIT, а затем присоединиться.

SELECT tp.* 
FROM diff_profile tp 
JOIN (
  SELECT profile_id from diff_profile_full_text_document AS full_text_document  
  WHERE to_tsquery( 'coffee' ) @@ (full_text_document.rankings) AND (ts_rank_cd(full_text_document.rankings, to_tsquery( 'coffee' )) > 0 ) 
  ORDER BY ts_rank_cd(full_text_document.rankings, to_tsquery('coffee')) DESC 
  LIMIT 1000
) ftd ON tp.id = ftd.profile_id

Похоже, как это сбривает примерно половину времени. Чтобы go намного больше, вам, вероятно, нужно сделать ваши запросы более избирательными. Получение и упорядочивание сотен тысяч строк только для возврата 1000 из них никогда не будет очень эффективным.

...