Консолидировать записи - PullRequest
0 голосов
/ 03 мая 2011

Я хочу объединить набор записей

(id) /     (referencedid)

1         10  
1         11  
2         11  
2         10  
3         10  
3         11  
3         12

Результат запроса должен быть

1         10  
1         11  
3         10  
3         11  
3         12  

Таким образом, поскольку id = 1 и id = 2 имеют одинаковый набор соответствующих ссылок {10,11}, они будут объединены. Но id = 3 s соответствующие ссылки не совпадают, следовательно, не будут объединены.

Что было бы хорошим способом сделать это?

Ответы [ 2 ]

1 голос
/ 03 мая 2011
Select id, referenceid
From MyTable
Where Id In (
                Select Min( Z.Id ) As Id
                From    (
                        Select Z1.id, Group_Concat( Z1.referenceid ) As signature
                        From    (
                                Select id, referenceid
                                From MyTable
                                Order By id, referenceid
                                ) As Z1
                        Group By Z1.id
                        ) As Z
                Group By Z.Signature
                )
0 голосов
/ 03 мая 2011
-- generate count of elements for each distinct id
with Counts as (
    select
        id,
        count(1) as ReferenceCount
    from
        tblReferences R
    group by
        R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
    select
        R1.id as id1
        ,R2.id as id2
        ,C1.ReferenceCount as count1
        ,C2.ReferenceCount as count2
        ,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
    from
        tblReferences R1 join Counts C1 on R1.id = C1.id
    cross join
        tblReferences R2 join Counts C2 on R2.id = C2.id
    where
        R1.id < R2.id
    group by
        R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
    select
        distinct id2 as id
    from
        Pairings P
    where
        P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed.  can also include id in the query above
-- to see which id was identified as the copy

select id from RemovableIds R

-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!

--delete from tblReferences where id in (select id from RemovableIds) R
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...