Django

Code

Ticket #3592 (closed: fixed)

Opened 2 years ago

Last modified 6 months ago

Allowing Q Objects to select between OUTER and INNER joins (semi-intelligently)

Reported by: axiak@mit.edu Assigned to: nobody
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords: Q,QuerySet,JOINs qset_refactor,qs-rf-fixed
Cc: mir@noris.de Triage Stage: Accepted
Has patch: 0 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

I would like my Q objects to use LEFT OUTER JOIN or INNER JOIN depending on whether or not one is better than the other. e.g. if I do x & y I probably want INNER JOIN, but if I do x | y, I think the DEFAULT behavior should be an OUTER JOIN if there are any foreign key lookups.

While I do believe a complete QuerySet? Refactoring is in order, I feel as though the simplicity and the great default behavior that Q objects offer is nice to have, and that this patch will improve the default behavior tremendously.

Attachments

fixed_Q_joins.patch (2.9 kB) - added by anonymous on 02/27/07 00:32:29.
ticket-3592-joins.diff (3.5 kB) - added by mir@noris.de on 03/09/07 13:38:33.
Improved patch, replaces the older one

Change History

02/27/07 00:32:29 changed by anonymous

  • attachment fixed_Q_joins.patch added.

02/27/07 01:04:40 changed by anonymous

  • needs_better_patch changed.
  • needs_tests set to 1.
  • needs_docs changed.

02/27/07 01:16:35 changed by Simon G. <dev@simon.net.nz>

  • stage changed from Unreviewed to Design decision needed.

02/28/07 15:27:26 changed by axiak@mit.edu

  • needs_better_patch set to 1.

Some more things:

I realized that the correct way to do ORing is by use of UNION. For example, if I have three models: X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then: (They are all in app `package')

Q(yaisnull = False) | Q(zbisnull = False) should yield the following query: (approximate) SELECT * FROM package_x

INNER JOIN "package_y" AS "package_xy" ON

package_x.id = package_xy.x_id

WHERE package_xy.a IS NOT NULL

UNION SELECT * FROM package_x

INNER JOIN "package_z" AS "package_xz" ON

package_x.id = package_xz.x_id

WHERE package_xz.b IS NOT NULL

Is this clear? Unfortunately, it would take some rewriting of django to even attempt this.

02/28/07 15:28:45 changed by anonymous

(sorry I forgot to block quote...) Some more things:

I realized that the correct way to do ORing is by use of UNION. For example, if I have three models: X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then: (They are all in app `package')

Q(yaisnull = False) | Q(zbisnull = False) should yield the following query:

(approximate)

SELECT * FROM package_x
    INNER JOIN "package_y" AS "package_xy" ON
        package_x.id = package_xy.x_id
            WHERE package_xy.a IS NOT NULL
UNION 
SELECT * FROM package_x
    INNER JOIN "package_z" AS "package_xz" ON
        package_x.id = package_xz.x_id
            WHERE package_xz.b IS NOT NULL

Is this clear? Unfortunately, it would take some rewriting of django to even attempt this.

(follow-up: ↓ 6 ) 02/28/07 19:43:05 changed by mtredinnick

  • owner changed from adrian to mtredinnick.
  • needs_better_patch deleted.
  • has_patch deleted.
  • needs_tests deleted.
  • stage changed from Design decision needed to Accepted.

Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q() objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q() objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).

Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q() objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.

(in reply to: ↑ 5 ) 03/02/07 21:36:56 changed by axiak@mit.edu

While I'm well aware that there is a QuerySet refactoring going on, I'm afraid that it's going to come along much too late. The fact of the matter is, the current implementation of Q OR'ing could best be described as a bug. From what I read it sounds like 0.96 is going to be the last release before a slew of backwards-incompatible changes get merged (oldforms becomes defunct, this queryset refactoring...).

This means, that if someone starts using Django, and needs to use Q objects to OR (pretty reasonable, my site is using it for our permission systems), they might come across this bug (which I think would appear pretty obscure to them if they haven't read through this code), with no where to go except maybe writing raw SQL for each of these. Might it be worth raising a NotImplementedError? for ORs that have foreign-key lookups? IMHO that's better than the current result.

This is why I posted it here. I'd at least like if someone runs into this problem to know there's at least a patch they can apply that's not fully tested.

Replying to mtredinnick:

Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q() objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q() objects much, since the refactoring is being worked on at the moment (although it will be post-0.96). Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q() objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.

03/08/07 11:11:45 changed by ubernostrum

#3665 was a duplicate of this.

03/09/07 13:38:33 changed by mir@noris.de

  • attachment ticket-3592-joins.diff added.

Improved patch, replaces the older one

03/09/07 14:06:29 changed by mir@noris.de

There was a strange bug in the patch: In line 736, it called self.underneath_an_or(), which is a non-callable attribute. Replacing it with self.search_for_or() worked. Tests are passed, and all tests of my application also passed.

But the patch still helped a lot to get something done right now--thanks, axiak!

Bug #1801 (a problem with joins and AND) is unrelated, in case anybody wonders.

03/09/07 14:06:55 changed by anonymous

  • cc set to mir@noris.de.

03/09/07 17:01:44 changed by mir@noris.de

Hmm ... I've played with the patch a bit, and I'm a bit disappointed. Outer joins seem too hard to optimize for the database. At least mysql 4.1 creates really stupid execution plans, and the queries are effectively unusable.

Perhaps this should be solved with UNION instead of OUTER JOIN.

(follow-up: ↓ 13 ) 03/09/07 18:17:19 changed by axiak@mit.edu

I would love to use UNION. It is the right way to do it (it is logically what is meant by |). However, it seems I cannot do that without doing enough of a rewrite as to really merit a refactoring of QuerySet?. This is meant to buy the current Q's some time...

What type data are you working with? I'm assuming you have indices for the columns you're joining on. If so, how many rows do you have that it's a problem?

Just curious--maybe there's a way to optimize this further?

03/09/07 20:19:38 changed by axiak@mit.edu

Please look at #3691 for some more features.

(in reply to: ↑ 11 ) 03/11/07 13:42:12 changed by Michael Radziej <mir@noris.de>

Replying to axiak@mit.edu:

A problem with the patch is that it turns every join below the or node into an outer join. Nested joins, with an outer join at the or node and inner joins below, would make more sense. I don't know if it would help, though. The problem is with a table (basically, an association table) with a few 10k rows. From this table, I get an outer join to another table with a few 1000 rows. But it's too complex for demonstration.

There's a problem with union: To AND two unions, you'd need INTERSECT. But mysql 4.1 doesn't have INTERSECT :-(

I cannot explore this further, since I don't have more time for it. I'll resort to manual sql, once again.

As a general rule, it's hard generate useable SQL in complex cases, and it's a lot harder to support multiple databases with this. mysql isn't famous for optimizing complex queries, so solving this in the general case might really turn out to be impossible for this ***** "database".

Sorry that I cannot offer any more help :-(

06/05/07 00:15:48 changed by mir@noris.de

  • keywords changed from Q,QuerySet,JOINs to Q,QuerySet,JOINs qset_refactor.
  • needs_better_patch set to 1.

Needs a better patch for reasons in my posting above. This will probably have to wait for the queryset refactoring.

06/05/07 10:25:44 changed by Michael Axiak <axiak@mit.edu>

AFAICT this patch will be defunct in the new queryset refactor.

09/13/07 16:38:48 changed by mtredinnick

  • keywords changed from Q,QuerySet,JOINs qset_refactor to Q,QuerySet,JOINs qset_refactor,qs-rf-fixed.

04/26/08 21:50:16 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [7477]) Merged the queryset-refactor branch into trunk.

This is a big internal change, but mostly backwards compatible with existing code. Also adds a couple of new features.

Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658

(follow-up: ↓ 19 ) 07/15/08 04:05:22 changed by sime

  • status changed from closed to reopened.
  • resolution deleted.

It appears that OR-ing Q objects across related tables still produces an INNER JOIN?

(in reply to: ↑ 18 ) 07/15/08 06:39:48 changed by mir

  • status changed from reopened to closed.
  • resolution set to fixed.

Replying to sime:

It appears that OR-ing Q objects across related tables still produces an INNER JOIN?

Please give an example of a query that results in an inner join when you think it should not, and reopen the ticket again.

07/15/08 11:45:55 changed by mtredinnick

Please don't reopen this ticket. Rather, pen a new ticket containing the short example that demonstrates the problem. The issue in this ticket has been fixed and if you're seeing something going strange with Q-object behaviour, that's a separate issue. Let's keep the histories nad comments for separate issues distinct, please.

07/23/08 04:29:57 changed by mir

related: #7872


Add/Change #3592 (Allowing Q Objects to select between OUTER and INNER joins (semi-intelligently))




Change Properties
Action