Django

Code

Ticket #17 (assigned)

Opened 3 years ago

Last modified 2 days ago

Metasystem optimization: Share select_related in memory

Reported by: adrian Assigned to: PhiR (accepted)
Milestone: post-1.0 Component: Database layer (models, ORM)
Version: SVN Keywords: feature caching
Cc: micsco, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com, matt@sanoodi.com, romme@tesall.ru Triage Stage: Accepted
Has patch: 1 Needs documentation: 1
Needs tests: 0 Patch needs improvement: 1

Description

When using select_related, each cache is stored separately in memory. For example, each Choice object here has its Poll cached, but each of those caches is a separate object in memory. It would require less memory if the caches were references to the same Poll object.

>>> from djangomodels.polls import choices
>>> choice_list = choices.get_list(poll__id__exact=90, select_related=True)
>>> id(choice_list[0]._poll_cache)
-156344020
>>> id(choice_list[1]._poll_cache)
-156344084

Attachments

patch (4.0 kB) - added by ferringb@gmail.com on 05/22/07 20:52:30.
rough stab at instance caching
patch2.diff (4.0 kB) - added by David Cramer <dcramer@gmail.com> on 05/23/07 12:36:12.
corrected an issue with pk_val not being set
instance-caching-2.patch (20.9 kB) - added by ferringb@gmail.com on 05/31/07 00:57:29.
instance-caching-v2
instance-caching-3.patch (11.2 kB) - added by Brian Harring <ferringb@gmail.com> on 06/02/07 04:36:38.
instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.
5737_instance_caching.diff (8.6 kB) - added by Brian Rosner <brosner@gmail.com> on 07/20/07 18:56:12.
updated patch to #5737 with some minor edits
instance-caching-4.patch (12.7 kB) - added by Brian Harring <ferringb@gmail.com> on 07/20/07 19:56:39.
split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).
instance-caching-5.patch (12.8 kB) - added by Brian Harring <ferringb@gmail.com> on 08/09/07 18:28:01.
add in a class attr able to disable instance caching for that model.
17_no_tests.diff (10.8 kB) - added by PhiR on 09/14/07 22:08:54.
cleanup and comments. Not working but comments welcome
17.diff (19.5 kB) - added by PhiR on 09/15/07 14:07:36.
added tests, and style issues

Change History

06/12/06 06:36:39 changed by Here

  • type changed from enhancement to defect.

01/20/07 17:31:53 changed by Gary Wilson <gary.wilson@gmail.com>

  • stage changed from Unreviewed to Accepted.

Barring any gotchas, I don't think anyone would argue with this.

05/22/07 18:44:01 changed by ferringb@gmail.com

  • cc set to ferringb@gmail.com.

Worth noting this also will speed things up pretty nicely for mass instance creation- Model.init

Downside to such instance caching is that currently with django, you can get away with modifying an instance and decide to not commit the changes- if it's a shared instance, that no longer flies. Have a patch locally that does caching, but had a few failures test wise.

Still any interest? If so, can dig it out and update it.

05/22/07 18:52:29 changed by SmileyChris

Can't harm to attach your patch to this ticket, even if it still needs work.

05/22/07 20:52:30 changed by ferringb@gmail.com

  • attachment patch added.

rough stab at instance caching

05/22/07 21:02:55 changed by ferringb@gmail.com

Offhand, that patch breaks a couple of tests; screws up a couple of tests that rely on checking the # of queries to see if things behaved (those tests are slightly broke imo anyways, since they're not checking *what* the queries were about, just that the db was touched N times).

Basically consists of two changes; change 1: modify Model metaclass overriding default call to do a weakref cache lookup, returning the instance if it's already available; additionally, modify the save method to update the cache for new instances post commit so that they're reused.

change 2: mangle fields.ReverseSingleRelatedObjectDescriptor? inserting a cache lookup; did it this way since bastardizing the query api didn't seem wise, but changing record_inst.foreign_key to try and use cached instances (thus avoiding hitting the db) *is* useful.

Mind you, been a while since I've done serious testing against this, but recall change #2 being particularly useful for select_related; reduction of 32s to 23s for 53k record set joining in around 20 unique foreign keys. 14.6k records with a simple display, reduction from 11.8s to 5.4s; mainly via again avoiding the overkill of select_related, and via dodging Model.init; *very* heavy function call runtime wise.

Fair warning also; works for my own uses, but definitely needs heavier testing (and unittests). Basically is a proof of concept.

05/22/07 22:03:18 changed by SmileyChris

  • needs_better_patch set to 1.
  • has_patch set to 1.
  • needs_tests set to 1.

05/23/07 01:13:24 changed by David Cramer <dcramer@gmail.com>

  • cc changed from ferringb@gmail.com to ferringb@gmail.com, dcramer@gmail.com.

We are running this live on curse-gaming.com now -- in a brief set of tests I ran we gained 30% speed and 80% more efficiency on memory use in our apps. We ran into one issue w/ a set of models doing .select_related() (at a depth of 2) but it wasn't serious for us (as we don't do 2nd depth select_related's.. i dont recommend anyone to do it) and this change is VERY important for load optimization.

05/23/07 09:30:08 changed by Brian Rosner <brosner@gmail.com>

  • cc changed from ferringb@gmail.com, dcramer@gmail.com to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com.

05/23/07 12:36:12 changed by David Cramer <dcramer@gmail.com>

  • attachment patch2.diff added.

corrected an issue with pk_val not being set

(follow-up: ↓ 10 ) 05/23/07 13:48:45 changed by David Cramer <dcramer@gmail.com>

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError?

(in reply to: ↑ 9 ) 05/29/07 12:05:47 changed by ferringb@gmail.com

re: patch2.diff, that's broke- note

self.__instance_cache__.setdefault(getattr(self, self._meta.pk.attname, self))

that setdefault's a None in place. Should be

self.__instance_cache__[getattr(self, self._meta.pk.attname)] = self

instead; Forces the cache to reuse the last saved version whenever the next request comes in.

Aside from that, patch has issues with json/xml serialization bits which basically just requires disabling instance caching for generation of the 'detached' instance that haven't yet been saved.

Replying to David Cramer <dcramer@gmail.com>:

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses). The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError?

Got a test case? I'm offline for the next 36 hours, but have got a local patch (will post when the system is online) passing all unittests at this point- haven't seen anything delete related in local fixing.

05/31/07 00:57:29 changed by ferringb@gmail.com

  • attachment instance-caching-2.patch added.

instance-caching-v2

05/31/07 01:39:07 changed by anonymous

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com.

06/02/07 04:36:38 changed by Brian Harring <ferringb@gmail.com>

  • attachment instance-caching-3.patch added.

instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.

06/03/07 09:32:46 changed by anonymous

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org.

07/19/07 11:19:18 changed by anonymous

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com.

07/20/07 18:55:39 changed by Brian Rosner <brosner@gmail.com>

I have updated this patch to trunk. I made some minor changes to some naming of variables and comments. The test suite errors with 4 errors related to data from the instance being a byte-string and not unicode. I searched through all the code and could not find why. This doesn't have anything to do with how the files are being saved or accessed? Perhaps it is just some setting on my side. Once I can grab some more time I will write up some more tests for this patch to ensure it works better.

07/20/07 18:56:12 changed by Brian Rosner <brosner@gmail.com>

  • attachment 5737_instance_caching.diff added.

updated patch to #5737 with some minor edits

(follow-up: ↓ 16 ) 07/20/07 18:57:39 changed by Brian Rosner <brosner@gmail.com>

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

(in reply to: ↑ 15 ) 07/20/07 19:49:13 changed by ferringb@gmail.com

  • needs_better_patch deleted.

Replying to Brian Rosner <brosner@gmail.com>:

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

Only unrelated changes (patch wise) you removed were conversion of range in delete to xrange; those might as well stay (already tweaking that area of the file).

The tests changes that were dropped (forced conversion of a few test args up front) however *are* required- run the tests, it'll hork. They're needed- that's more a sign that the misc fields are screwed up since they're not forcing unicode conversion, but that's a seperate mess only exposed by tests.

Either way, a version of this is maintained in a bzr branch at http://pkgcore.org/~ferringb/bzr/instance-caching/ ; using it for curse these days, so I keep it reasonably within trunk (week lag is usually the most).

Related to that, please don't do misc var renames when correcting for bit rot- not trying to be possessive, moreso it's extra noise when trying to nail down exactly what was updated between patch versions, and changes the api the functionality provides when kwargs are mangled.

Finally, dropping the "patch needs improvement"; not really relevant at this point (my earlier comments have long since been fixed).

07/20/07 19:56:39 changed by Brian Harring <ferringb@gmail.com>

  • attachment instance-caching-4.patch added.

split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).

08/09/07 18:28:01 changed by Brian Harring <ferringb@gmail.com>

  • attachment instance-caching-5.patch added.

add in a class attr able to disable instance caching for that model.

08/09/07 21:42:15 changed by Simon G. <dev@simon.net.nz>

  • needs_tests deleted.
  • stage changed from Accepted to Ready for checkin.

This looks good, tests and everything, and would be a fantastic improvment, so I'm bumping this to RFC.

09/14/07 12:21:48 changed by jacob

  • needs_better_patch set to 1.
  • stage changed from Ready for checkin to Accepted.
  • needs_docs set to 1.

This is a very good idea, but isn't quite ready for prime-time. Some things that need fixing:

  • The class attribute feels hackish, and is badly named. I'd prefer something simpler like Model.objects.select_related(cache=False) -- any reason not to do this?
  • A note needs to be added to the docs about this behavior so that people don't get bitten by stale select related stuff.
  • There's a lot of dense code here without docstrings/comments; I can barely follow what's going on. Some more explanation of what's happening would be very nice with a feature of this complexity.

09/14/07 19:48:13 changed by PhiR

The class attribute allows very fine-grained control over caching: you can select which classes will be cached. But a cache=False argument to select_related would be a nice complement.

09/14/07 20:23:20 changed by PhiR

  • owner changed from nobody to PhiR.

09/14/07 22:08:54 changed by PhiR

  • attachment 17_no_tests.diff added.

cleanup and comments. Not working but comments welcome

09/15/07 13:53:05 changed by PhiR

My patch addresses cleans up and documents the doc, and adds way more tests. It also removed unrelated changes that were present in the original patch. It's still subject to jacob's points 1 and 2, namely how to configure the cache and lack of user doc. Note that by default the cache is disabled so it's safe.

09/15/07 14:07:36 changed by PhiR

  • attachment 17.diff added.

added tests, and style issues

09/15/07 14:24:50 changed by PhiR

  • keywords set to feature caching.

09/15/07 21:04:05 changed by PhiR

  • status changed from new to assigned.

09/16/07 11:14:14 changed by ubernostrum

See also #3369, which was closed in favor of this ticket.

01/18/08 02:14:59 changed by Bastian Kleineidam <calvin@debian.org>

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org.

03/21/08 04:43:44 changed by PhiR

See #5514 for a use case (besides the performance/memory gains).

03/21/08 12:34:10 changed by david

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com.

(follow-up: ↓ 29 ) 03/21/08 14:10:04 changed by axiak

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
  2. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

(in reply to: ↑ 28 ; follow-up: ↓ 30 ) 03/21/08 15:12:45 changed by anonymous

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues: 1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

2. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

The cache is a WeakValueDictionary?... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

(in reply to: ↑ 29 ) 03/21/08 15:36:07 changed by axiak

Replying to anonymous:

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues: 1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

As I understand it the storage is part of the class object, which is global, no? So currently the patch does not behave like this and this will need to be addressed.

... The cache is a WeakValueDictionary?... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

Ah, didn't see that.

03/25/08 08:40:19 changed by PhiR

If you are interested in this ticket you should check out:

http://code.djangoproject.com/wiki/DjangoSpecifications/Core/SingleInstance

If you notice something that this page doesn't mention please add it.

07/17/08 22:50:07 changed by anonymous

  • version set to queryset-refactor.
  • milestone set to 1.0.

07/17/08 22:51:04 changed by anonymous

  • version deleted.
  • milestone deleted.

07/28/08 13:24:15 changed by oliver@obeattie.com

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com.

08/03/08 02:39:52 changed by mrts

  • version set to SVN.
  • milestone set to post-1.0.

Not in scope for 1.0.

08/31/08 06:26:57 changed by haavikko

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com to ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com.

09/09/08 06:46:41 changed by anonymous

  • cc changed from ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com to micsco, ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com.

09/18/08 16:46:58 changed by mattrussell

  • cc changed from micsco, ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com to micsco, ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com, matt@sanoodi.com.

09/23/08 19:22:17 changed by Brian Harring <ferringb@gmail.com>

  • cc changed from micsco, ferringb@gmail.com, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com, matt@sanoodi.com to micsco, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com, matt@sanoodi.com.

10/25/08 08:39:17 changed by RommeDeSerieux

  • cc changed from micsco, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com, matt@sanoodi.com to micsco, dcramer@gmail.com, brosner@gmail.com, paching@gmail.com, wbyoung@mcdonogh.org, jesse.lovelace@gmail.com, calvin@debian.org, larlet@gmail.com, oliver@obeattie.com, haavikko@gmail.com, matt@sanoodi.com, romme@tesall.ru.

12/02/08 05:07:10 changed by dcramer

Ive worked up a quick "SingletonModel?" piece of code. It allows you to define a model, by inheriting from SingletonModel?, as a Singleton. It works almost identically to this patch, barring the removal of some unneeded code.

http://github.com/dcramer/django-singletons/tree/master


Add/Change #17 (Metasystem optimization: Share select_related in memory)




Change Properties
Action