Как оптимизировать SQL при наличии массива и географии в предложении where в postgresql - PullRequest
0 голосов
/ 22 сентября 2019

Я новичок в PostgreSQL и PostGIS, поэтому этот вопрос может быть глупым, я хочу знать, как оптимизировать эту ситуацию.

Вот подробности

Версия PostgreSQL: 10.10

индекс таблицы и информация о поле:

testgis=# \d test;
                                  Table "public.test"
   Column   |       Type       | Collation | Nullable |            Default
------------+------------------+-----------+----------+--------------------------------
 id         | bigint           |           | not null | nextval('serial_id'::regclass)
 location   | geography        |           | not null |
 latitude   | double precision |           | not null |
 longitude  | double precision |           | not null |
 time_range | tsrange          |           |          |
 int1       | integer          |           | not null |
 int2       | integer          |           | not null |
 ids1       | bigint[]         |           |          |
 ids2       | bigint[]         |           |          |
Indexes:
    "btree_int1" btree (int1)
    "btree_int2" btree (int2)
    "gin_ids1" gin (ids1)
    "gin_ids2" gin (ids2)
    "gist_location" gist (location)
    "gist_time_range" gist (time_range)

информация о размере:

SELECT row_estimate,pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT  relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r' and relname='test'
  ) a
) a;

 row_estimate | total  | index |   toast    | table
--------------+--------+-------+------------+-------
       302471 | 148 MB | 80 MB | 8192 bytes | 68 MB

location поле является точкой, SQL вставки имеет вид:

INSERT INTO test (location,latitude,longitude,time_range,int1,int2,ids1,ids2)
VALUES
(ST_GeographyFromText('POINT(106.800382 -6.098953)'), -6.098953, 106.800382, '[2019-09-01 00:00:00, 2019-09-20 00:00:00]', 1, 2, '{100, 101}', '{50}')

вот мой запрос SQL и план запроса:

explain (analyze, buffers) select id
from test
where
location <-> ST_GeographyFromText('POINT(106.800382 -6.098953)') < 15000
and
(ids1 @> ARRAY[100]::bigint[] or ids2 @> ARRAY[100]::bigint[])
order by location <-> ST_GeographyFromText('POINT(106.800382 -6.098953)') limit 800;

```none
Limit  (cost=0.28..8858.30 rows=800 width=16) (actual time=1.126..28.605 rows=800 loops=1)
   Buffers: shared hit=7408
   ->  Index Scan using gist_location on test  (cost=0.28..131730.06 rows=11897 width=16) (actual time=1.126..28.507 rows=800 loops=1)
         Order By: (location <-> '0101000020E6100000A7936C7539B35A40465D6BEF536518C0'::geography)
         Filter: (((ids1 @> '{100}'::bigint[]) OR (ids2 @> '{100}'::bigint[])) AND ((location <-> '0101000020E6100000A7936C7539B35A40465D6BEF536518C0'::geography) < '15000'::double precision))
         Rows Removed by Filter: 5840
         Buffers: shared hit=7408
 Planning time: 0.398 ms
 Execution time: 28.729 ms
(9 rows)

Если я изменю (ids1 @> ARRAY[100]::bigint[] or ids2 @> ARRAY[100]::bigint[]) на (ids1 @> ARRAY[1]::bigint[] or ids2 @> ARRAY[1]::bigint[]) (с 100 на 1), план запроса изменится на:

 Limit  (cost=8104.48..8106.48 rows=800 width=16) (actual time=10.106..10.147 rows=209 loops=1)
   Buffers: shared hit=3201
   ->  Sort  (cost=8104.48..8107.48 rows=1200 width=16) (actual time=10.105..10.123 rows=209 loops=1)
         Sort Key: ((location <-> '0101000020E6100000A7936C7539B35A40465D6BEF536518C0'::geography))
         Sort Method: quicksort  Memory: 34kB
         Buffers: shared hit=3201
         ->  Bitmap Heap Scan on test  (cost=67.67..8043.10 rows=1200 width=16) (actual time=1.691..10.032 rows=209 loops=1)
               Recheck Cond: ((ids1 @> '{1}'::bigint[]) OR (ids2 @> '{1}'::bigint[]))
               Filter: ((location <-> '0101000020E6100000A7936C7539B35A40465D6BEF536518C0'::geography) < '15000'::double precision)
               Rows Removed by Filter: 3376
               Heap Blocks: exact=3185
               Buffers: shared hit=3201
               ->  BitmapOr  (cost=67.67..67.67 rows=3609 width=0) (actual time=0.982..0.982 rows=0 loops=1)
                     Buffers: shared hit=10
                     ->  Bitmap Index Scan on gin_ids1  (cost=0.00..32.17 rows=1623 width=0) (actual time=0.622..0.622 rows=2030 loops=1)
                           Index Cond: (ids1 @> '{1}'::bigint[])
                           Buffers: shared hit=5
                     ->  Bitmap Index Scan on gin_ids2  (cost=0.00..34.90 rows=1986 width=0) (actual time=0.359..0.359 rows=1960 loops=1)
                           Index Cond: (ids2 @> '{1}'::bigint[])
                           Buffers: shared hit=5
 Planning time: 0.237 ms
 Execution time: 10.215 ms
(22 rows)

число (ids1 @> ARRAY[100]::bigint[] or ids2 @> ARRAY[100]::bigint[]) равно 1932, больше, чем число (ids1 @> ARRAY[1]::bigint[] or ids2 @> ARRAY[1]::bigint[]), которое равно 209

testgis=# select count(*)
testgis-# from test
testgis-# where
testgis-# location <-> ST_GeographyFromText('POINT(106.800382 -6.098953)') < 15000
testgis-# and
testgis-# (ids1 @> ARRAY[100]::bigint[] or ids2 @> ARRAY[100]::bigint[]);
 count
-------
  1932
(1 row)

testgis=# select count(*)
testgis-# from test
testgis-# where
testgis-# location <-> ST_GeographyFromText('POINT(106.800382 -6.098953)') < 15000
testgis-# and
testgis-# (ids1 @> ARRAY[1]::bigint[] or ids2 @> ARRAY[1]::bigint[]);
 count
-------
   209
(1 row)
testgis=# select count(*) from test where (ids1 @> ARRAY[100]::bigint[] or ids2 @> ARRAY[100]::bigint[]);
 count
-------
 36489

Так как я могу оптимизировать этот SQL (lng, lat,радиус, идентификатор является переменной)?

select id
from test
where
location <-> ST_GeographyFromText('POINT(<lng> <lat>)') < <radius>
and
(ids1 @> ARRAY[<id>]::bigint[] or ids2 @> ARRAY[<id>]::bigint[])
order by location <-> ST_GeographyFromText('POINT(<lng> <lat>)') limit 800;

После установки для enable_indexscan значения 0 план запроса:

testgis=# BEGIN; SET LOCAL enable_indexscan=0; explain (analyze, buffers) select id
BEGIN
SET
testgis-# from test
testgis-# where
testgis-# location <-> ST_GeographyFromText('POINT(106.800382 -6.098953)') < 15000
testgis-# and
testgis-# (ids1 @> ARRAY[100]::bigint[] or ids2 @> ARRAY[100]::bigint[])
testgis-# order by location <-> ST_GeographyFromText('POINT(106.800382 -6.098953)') limit 800; ROLLBACK;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15627.24..15720.58 rows=800 width=16) (actual time=40.887..43.748 rows=800 loops=1)
   Buffers: shared hit=6082
   ->  Gather Merge  (cost=15627.24..16783.95 rows=9914 width=16) (actual time=40.886..43.648 rows=800 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=6082
         ->  Sort  (cost=14627.21..14639.61 rows=4957 width=16) (actual time=24.839..24.882 rows=431 loops=3)
               Sort Key: ((location <-> '0101000020E6100000A7936C7539B35A40465D6BEF536518C0'::geography))
               Sort Method: quicksort  Memory: 100kB
               Buffers: shared hit=6082
               ->  Parallel Bitmap Heap Scan on test  (cost=359.15..14322.97 rows=4957 width=16) (actual time=3.468..24.555 rows=644 loops=3)
                     Recheck Cond: ((ids1 @> '{100}'::bigint[]) OR (ids2 @> '{100}'::bigint[]))
                     Filter: ((location <-> '0101000020E6100000A7936C7539B35A40465D6BEF536518C0'::geography) < '15000'::double precision)
                     Rows Removed by Filter: 11519
                     Heap Blocks: exact=5140
                     Buffers: shared hit=6066
                     ->  BitmapOr  (cost=359.15..359.15 rows=35893 width=0) (actual time=4.661..4.661 rows=0 loops=1)
                           Buffers: shared hit=16
                           ->  Bitmap Index Scan on gin_ids1  (cost=0.00..319.81 rows=34109 width=0) (actual time=4.163..4.163 rows=35123 loops=1)
                                 Index Cond: (ids1 @> '{100}'::bigint[])
                                 Buffers: shared hit=11
                           ->  Bitmap Index Scan on gin_ids2  (cost=0.00..33.38 rows=1785 width=0) (actual time=0.496..0.496 rows=2003 loops=1)
                                 Index Cond: (ids2 @> '{100}'::bigint[])
                                 Buffers: shared hit=5
 Planning time: 0.194 ms
 Execution time: 43.847 ms
(26 rows)

ROLLBACK
...