Я использую PostgreSQL 11.8. В таблице products
У меня 335198 строк, похоже, немного, я думаю, может быть, 2 миллиона. Некоторая конфигурация postgres -c work_mem=100MB -c max_parallel_workers_per_gather=6 -c max_connections=300
И я хочу ранжировать результаты поиска, но у меня есть данные результатов группировки.
create index npdbcs_swedish_custom_index on products
using GIN(to_tsvector('pg_catalog.swedish', name||price||description||brand))
Теперь мой запрос занял 62 секунды :( не очень хорошо. Без ранжирования 2.2 секунда это отличный результат, индекс GIN работает правильно. Как правильно применить рейтинг для моего запроса, помогите, пожалуйста
если я правильно понимаю, сколько времени потрачено на ts_rank_cd
и ORDER BY rank
, у меня был такой же вывод, когда я прокомментировал ts_rank_cd
в select и ORDER BY runk
и потрачено 2 секунды
UPDATE Спасибо @jjanes Я изменил to_tsvector
logi c, и теперь я сохранил эти данные в отдельном столбце. это моя функция с триггером
DROP TRIGGER IF EXISTS tsvectorupdate ON products;
DROP FUNCTION IF EXISTS products_ts_trigger;
CREATE FUNCTION products_ts_trigger() RETURNS trigger AS $$
begin
new.common_fts :=
setweight(to_tsvector('pg_catalog.swedish', coalesce(new.name,'')), 'A') ||
setweight(to_tsvector('pg_catalog.swedish', coalesce(new.description,'')), 'B') ||
setweight(to_tsvector('pg_catalog.swedish', coalesce(new.price::text,'')), 'C') ||
setweight(to_tsvector('pg_catalog.swedish', coalesce(new.brand,'')), 'D');
return new;
end
$$ LANGUAGE plpgsql;
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON products FOR EACH ROW EXECUTE FUNCTION products_ts_trigger();
Я буду тестировать производительность seacrh завтра. Теперь у меня есть вопрос о price
, введите numeric 10 2
, правильно преобразовать его в текст?
это то, что я получено из тестового продукта
'1595.00':6C 'black':5A 'bärsel':2A 'matt':4A 'najell':1A,7 'original':3A
ОБНОВЛЕНО 19_06_20 Мой тестовый сервер 6 CPU и 16GB ORM Я сделал это для всех строк (сырые строки 331681
и после группировки group_identity
У меня было 114856
строк).
SELECT COUNT(*) FROM products;
331681
SELECT COUNT(*) FROM (
SELECT COUNT(*) FROM products
GROUP BY group_identity) as sub_s
114856
Создано * 10 37 * Добавлен новый индекс
CREATE INDEX common_ndpb_search_idx ON products USING GIN (common_fts);
set work_mem
= 2 ГБ (вместо 200 МБ) и разница в результате 61,5 с (без common_fts
) и 49,2 с (с common_fts
) - отличная производительность, но не достаточно.
установить на track_io_timing
и поделиться с вами тем, что я получил в планах запросов
EXPLAIN (ANALYZE, BUFFERS)
SELECT
products_alias.group_identity
,(array_agg(DISTINCT products_alias.shop))[1]::TEXT AS shop
,(array_agg(DISTINCT products_alias.shop_relation_id))[1]::INTEGER AS "shopRelationId"
,jsonb_agg(DISTINCT products_alias.extras) FILTER (WHERE products_alias.extras IS NOT NULL) AS extras
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.brand::text)) AS "storeBrand"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.currency::text)) AS "storeCurrency"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.price::text)) AS "storePrice"
,hstore(array_agg(products_alias.id::TEXT), array_agg(products_alias.image_url)) AS "storeImageUrl"
,hstore(array_agg(products_alias.id::TEXT), array_agg(products_alias.name)) AS "storeNames"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.extras::text)) AS "storeExtras"
,COUNT(DISTINCT uip.id) as "numberOfEntries"
,SUM(ts_rank_cd(products_alias.common_fts, to_tsquery('pg_catalog.swedish',
'Yard:*|subSkjortor:*|Skjortor:*|Barn:*|ebbe:*|ÖVERDELAR:*|till:*|barn:*'))) AS rank
FROM products products_alias
LEFT JOIN user_ip_product uip on uip.products_id = products_alias.id
LEFT JOIN product_category cp on cp.product_id = products_alias.id
WHERE products_alias.common_fts @@ to_tsquery('pg_catalog.swedish', 'Yard:*|subSkjortor:*|Skjortor:*|Barn:*|ebbe:*|ÖVERDELAR:*|till:*|barn:*')
GROUP BY products_alias.group_identity
ORDER BY
rank DESC,
"numberOfEntries" DESC
LIMIT 20
Limit (cost=146327.93..146327.98 rows=20 width=286) (actual time=49668.929..49668.937 rows=20 loops=1)
Buffers: shared hit=164403 read=50594
-> Sort (cost=146327.93..146455.42 rows=50997 width=286) (actual time=49668.928..49668.933 rows=20 loops=1)
Sort Key: (sum(ts_rank_cd(products_alias.common_fts, '''yard'':* | ''subskjort'':* | ''skjort'':* | ''barn'':* | ''ebb'':* | ''överdel'':* | ''barn'':*'::tsquery))) DESC, (count(DISTINCT uip.id)) DESC
Sort Method: top-N heapsort Memory: 490kB
Buffers: shared hit=164403 read=50594
-> GroupAggregate (cost=110836.74..144970.91 rows=50997 width=286) (actual time=1482.836..49612.157 rows=28329 loops=1)
Group Key: products_alias.group_identity
Buffers: shared hit=164403 read=50594
-> Sort (cost=110836.74..111696.38 rows=343854 width=623) (actual time=1482.182..1596.263 rows=191570 loops=1)
Sort Key: products_alias.group_identity
Sort Method: quicksort Memory: 180138kB
Buffers: shared hit=39 read=44477
-> Hash Left Join (cost=61681.49..79216.90 rows=343854 width=623) (actual time=500.548..901.555 rows=191570 loops=1)
Hash Cond: (products_alias.id = uip.products_id)
Buffers: shared hit=39 read=44477
-> Hash Right Join (cost=61668.34..77912.84 rows=343854 width=619) (actual time=500.525..850.976 rows=191570 loops=1)
Hash Cond: (cp.product_id = products_alias.id)
Buffers: shared hit=39 read=44477
-> Seq Scan on product_category cp (cost=0.00..14000.49 rows=854849 width=4) (actual time=0.024..109.998 rows=838650 loops=1)
Buffers: shared read=5452
-> Hash (cost=60000.65..60000.65 rows=133415 width=619) (actual time=498.826..498.826 rows=76368 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 50274kB
Buffers: shared hit=39 read=39025
-> Bitmap Heap Scan on products products_alias (cost=1405.97..60000.65 rows=133415 width=619) (actual time=93.217..410.644 rows=76368 loops=1)
Recheck Cond: (common_fts @@ '''yard'':* | ''subskjort'':* | ''skjort'':* | ''barn'':* | ''ebb'':* | ''överdel'':* | ''barn'':*'::tsquery)
Heap Blocks: exact=38977
Buffers: shared hit=39 read=39025
-> Bitmap Index Scan on common_ndpb_search_idx (cost=0.00..1372.61 rows=133415 width=0) (actual time=84.859..84.859 rows=76368 loops=1)
Index Cond: (common_fts @@ '''yard'':* | ''subskjort'':* | ''skjort'':* | ''barn'':* | ''ebb'':* | ''överdel'':* | ''barn'':*'::tsquery)
Buffers: shared hit=39 read=48
-> Hash (cost=11.40..11.40 rows=140 width=8) (actual time=0.005..0.005 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on user_ip_product uip (cost=0.00..11.40 rows=140 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Planning Time: 0.824 ms
Execution Time: 49683.687 ms
и без ранжирования
EXPLAIN (ANALYZE, BUFFERS)
SELECT
products_alias.group_identity
,(array_agg(DISTINCT products_alias.shop))[1]::TEXT AS shop
,(array_agg(DISTINCT products_alias.shop_relation_id))[1]::INTEGER AS "shopRelationId"
,jsonb_agg(DISTINCT products_alias.extras) FILTER (WHERE products_alias.extras IS NOT NULL) AS extras
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.brand::text)) AS "storeBrand"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.currency::text)) AS "storeCurrency"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.price::text)) AS "storePrice"
,hstore(array_agg(products_alias.id::TEXT), array_agg(products_alias.image_url)) AS "storeImageUrl"
,hstore(array_agg(products_alias.id::TEXT), array_agg(products_alias.name)) AS "storeNames"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.extras::text)) AS "storeExtras"
,COUNT(DISTINCT uip.id) as "numberOfEntries"
FROM products products_alias
LEFT JOIN user_ip_product uip on uip.products_id = products_alias.id
LEFT JOIN product_category cp on cp.product_id = products_alias.id
WHERE products_alias.common_fts @@ to_tsquery('pg_catalog.swedish', 'Yard:*|subSkjortor:*|Skjortor:*|Barn:*|ebbe:*|ÖVERDELAR:*|till:*|barn:*')
GROUP BY products_alias.group_identity
LIMIT 20
Limit (cost=0.99..180.91 rows=20 width=282) (actual time=0.098..1.912 rows=20 loops=1)
Buffers: shared hit=577
-> GroupAggregate (cost=0.99..438439.05 rows=48737 width=282) (actual time=0.097..1.905 rows=20 loops=1)
Group Key: products_alias.group_identity
Buffers: shared hit=577
-> Nested Loop Left Join (cost=0.99..408316.57 rows=328199 width=260) (actual time=0.042..1.294 rows=107 loops=1)
Buffers: shared hit=577
-> Nested Loop Left Join (cost=0.57..309213.08 rows=127214 width=260) (actual time=0.032..1.054 rows=36 loops=1)
Buffers: shared hit=432
-> Index Scan using group_identity on products products_alias (cost=0.42..287224.67 rows=127214 width=256) (actual time=0.026..0.996 rows=36 loops=1)
Filter: (common_fts @@ '''yard'':* | ''subskjort'':* | ''skjort'':* | ''barn'':* | ''ebb'':* | ''överdel'':* | ''barn'':*'::tsquery)
Rows Removed by Filter: 327
Buffers: shared hit=396
-> Index Scan using idx_5b9c784c6c8a81a9 on user_ip_product uip (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=36)
Index Cond: (products_id = products_alias.id)
Buffers: shared hit=36
-> Index Only Scan using idx_cdfc73564584665a on product_category cp (cost=0.42..0.74 rows=4 width=4) (actual time=0.004..0.005 rows=3 loops=36)
Index Cond: (product_id = products_alias.id)
Heap Fetches: 107
Buffers: shared hit=145
Planning Time: 0.612 ms
Execution Time: 2.015 ms
Любая идея, что я могу сделать больше для запроса оптимизации, если это возможно?
ОБНОВЛЕНИЕ
После долгих поисков и анализа я принял решение заменить ts_rank_cd
на ts_rank
и после потраченного времени уменьшено для запроса до 3,7 с
EXPLAIN (ANALYZE, BUFFERS)
SELECT
products_alias.group_identity
,(array_agg(DISTINCT products_alias.shop))[1]::TEXT AS shop
,(array_agg(DISTINCT products_alias.shop_relation_id))[1]::INTEGER AS "shopRelationId"
,jsonb_agg(DISTINCT products_alias.extras) FILTER (WHERE products_alias.extras IS NOT NULL) AS extras
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.brand::text)) AS "storeBrand"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.currency::text)) AS "storeCurrency"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.price::text)) AS "storePrice"
,hstore(array_agg(products_alias.id::TEXT), array_agg(products_alias.image_url)) AS "storeImageUrl"
,hstore(array_agg(products_alias.id::TEXT), array_agg(products_alias.name)) AS "storeNames"
,hstore(array_agg(products_alias.id::text), array_agg(products_alias.extras::text)) AS "storeExtras"
,COUNT(DISTINCT uip.id) as "numberOfEntries"
,SUM(ts_rank(products_alias.common_fts, to_tsquery('pg_catalog.swedish',
'Yard:*|subSkjortor:*|Skjortor:*|Barn:*|ebbe:*|ÖVERDELAR:*|till:*|barn:*'))) AS rank
FROM products products_alias
LEFT JOIN user_ip_product uip on uip.products_id = products_alias.id
LEFT JOIN product_category cp on cp.product_id = products_alias.id
WHERE products_alias.common_fts @@ to_tsquery('pg_catalog.swedish', 'Yard:*|subSkjortor:*|Skjortor:*|Barn:*|ebbe:*|ÖVERDELAR:*|till:*|barn:*')
GROUP BY products_alias.group_identity
ORDER BY
rank DESC
--products_alias.price DESC
LIMIT 20
Limit (cost=154665.62..154665.67 rows=20 width=286) (actual time=3661.952..3661.959 rows=20 loops=1)
Buffers: shared hit=302308 read=64079
-> Sort (cost=154665.62..154787.47 rows=48737 width=286) (actual time=3661.950..3661.955 rows=20 loops=1)
Sort Key: (sum(ts_rank(products_alias.common_fts, '''yard'':* | ''subskjort'':* | ''skjort'':* | ''barn'':* | ''ebb'':* | ''överdel'':* | ''barn'':*'::tsquery))) DESC
Sort Method: top-N heapsort Memory: 476kB
Buffers: shared hit=302308 read=64079
-> GroupAggregate (cost=120784.78..153368.75 rows=48737 width=286) (actual time=1840.274..3632.377 rows=28332 loops=1)
Group Key: products_alias.group_identity
Buffers: shared hit=302308 read=64079
-> Sort (cost=120784.78..121605.27 rows=328199 width=615) (actual time=1840.182..1888.089 rows=192241 loops=1)
Sort Key: products_alias.group_identity
Sort Method: quicksort Memory: 180793kB
Buffers: shared hit=131456 read=64079
-> Hash Left Join (cost=73238.20..90714.84 rows=328199 width=615) (actual time=922.587..1314.311 rows=192241 loops=1)
Hash Cond: (products_alias.id = uip.products_id)
Buffers: shared hit=131456 read=64079
-> Hash Right Join (cost=73225.05..89469.55 rows=328199 width=611) (actual time=922.572..1267.515 rows=192241 loops=1)
Hash Cond: (cp.product_id = products_alias.id)
Buffers: shared hit=131456 read=64079
-> Seq Scan on product_category cp (cost=0.00..14000.49 rows=854849 width=4) (actual time=0.011..100.653 rows=842342 loops=1)
Buffers: shared hit=212 read=5240
-> Hash (cost=71634.88..71634.88 rows=127214 width=611) (actual time=921.793..921.794 rows=76553 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 49374kB
Buffers: shared hit=131244 read=58839
-> Seq Scan on products products_alias (cost=0.00..71634.88 rows=127214 width=611) (actual time=0.096..839.409 rows=76553 loops=1)
Filter: (common_fts @@ '''yard'':* | ''subskjort'':* | ''skjort'':* | ''barn'':* | ''ebb'':* | ''överdel'':* | ''barn'':*'::tsquery)
Rows Removed by Filter: 256160
Buffers: shared hit=131244 read=58839
-> Hash (cost=11.40..11.40 rows=140 width=8) (actual time=0.003..0.003 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on user_ip_product uip (cost=0.00..11.40 rows=140 width=8) (actual time=0.002..0.002 rows=0 loops=1)
Planning Time: 0.906 ms
Execution Time: 3677.207 ms
Теперь мой главный вопрос, насколько разные функции ts_rank
и ts_rank_cd
(я читаю официальные do c ) может кто-нибудь объяснить простым выражением о разных?