Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

iShater

macrumors 604
Original poster
Aug 13, 2002
7,027
470
Chicagoland
Hopefully somebody has some thoughts or can point me to a location with information.

I have a fairly complex query that ends up with results that are considered duplicate when 4 columns are the same. Each duplicate has a unique PK ID, and I need to get the greatest one.

I need that result set to be compared against itself to keep only one of the dupe rows with the max PK_ID.

Any thoughts?
 
select max(id),uniq1,uniq2,uniq3,uniq4 from (...) group by uniq1,uniq2,uniq3,uniq4;
where ... is your existing query. This should group the rows by the four columns that must be unique, then you can perform an aggregate function over each group.

-Lee
 
select max(id),uniq1,uniq2,uniq3,uniq4 from (...) group by uniq1,uniq2,uniq3,uniq4;
where ... is your existing query. This should group the rows by the four columns that must be unique, then you can perform an aggregate function over each group.

-Lee

Hi Lee,

Thanks for your reply. You just confirmed the route I ended up using. We had a crazy where clause instead and the performance was pretty bad. I hoping this will work better, we are testing right now.

P.S. I knew you would be responding. :p
 
Hi Lee,

Thanks for your reply. You just confirmed the route I ended up using. We had a crazy where clause instead and the performance was pretty bad. I hoping this will work better, we are testing right now.

P.S. I knew you would be responding. :p

If you end up with some sort of self-join in a situation like this, you're probably not going to end up with an ideal solution even if it works. I have had to write things with self-joins, but if the table has more than a few hundred/thousand rows the cartesian product of the self join is going to be huge, and doing any sort of aggregates against it will be very expensive. If you've already filtered down the data to what you need, there's no shame in doing a subquery instead of finding the end-all, be-all join and where setup. At least not in my book.

-Lee
 
Depending on your database and hardware (i.e. something that supports, IIRC, SQL-99 and multi-processor hardware) you may find you get better results using the RANK function partitioned over your uniq fields, selecting only rows where the rank is = 1.

e.g.
Code:
select * from
(
  select pk, uniq1, uniq2, uniq3, uniq4, blah, bleh, meh, 
  rank() over (partition by uniq1, uniq2, uniq3, uniq4 order by pk desc)  as rk
  from ...
) as subq
where rk = 1

Without knowing the table structure, the other columns you want and their types, DBMS and hardware it is a little hard to be more precise.

[edit]
Here is a link for an SQL Server server centric discussion and illustration
http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx
 
This is an Oracle 10g database, no information available to me about the HW. There are enough joins early on in the query that my head is spinning. :eek:
 
It sounds like something to try then.

I don't have Oracle 10 but this article sounds like it should support the necessary functionality http://www.oracle.com/technology/oramag/code/tips2005/062005.html .

As the database knows what you are really asking, it will be able to optimise the execution path and where possible distribute the aggregation for each partition. Also indexes and running table statistics can lead to order of magnitude improvements. And use stored procedures.

For optimising your SQL, assuming you are on a Mac client, you could trial http://www.aquafold.com/ which has a straight forward query analyser (it's a good general tool for multiple dbms) , or use the Oracle optimisation tools. I don't know what you are using...
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.