Trey Raymond | 27 Nov 20:11 2012
Picon

Performance problems due to incorrect many-many primary key on many tables

Hi folks,
I'm a DB engineer working for Yahoo, and we have a new product using django that I'm onboarding.  We see a variety of easily fixed issues, but one major one - there are 21 many-many tables here, yet they have auto increment primary keys.  This of course is very slow on any platform, but particularly can't use clustering on innodb (you can't, obviously, use myisam on a production system) and the performance of queries on these tables is severely degraded.  I can't onboard them properly until we fix these, but they claim the code throws errors when I do - I can't figure out what could possibly reference the field, but something does.  If you have any suggestions on an easy way (they have limited dev resources, and us DBE's can't work on code directly) to remove these references, a setting or a very simple change, I'd appreciate it...also consider this a bug report for future versions.

More info below, but thank you guys
Trey Raymond

Example, existing:
CREATE TABLE `accounts_profile_starred_groups` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `profile_id` int(10) unsigned NOT NULL,
  `group_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `profile_id` (`profile_id`,`group_id`),
  KEY `group_id_refs_id_e2f1545` (`group_id`)
) ENGINE=InnoDB;
Fixed:
CREATE TABLE `accounts_profile_starred_groups` (
  `profile_id` int(10) unsigned NOT NULL,
  `group_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`profile_id`,`group_id`),
  KEY `group_id_refs_id_e2f1545` (`group_id`)
) ENGINE=InnoDB;


The table list:

accounts_profile_starred_groups
accounts_profile_starred_review_requests
accounts_reviewrequestvisit
auth_group_permissions
auth_user_groups
auth_user_user_permissions
reviews_defaultreviewer_groups
reviews_defaultreviewer_people
reviews_defaultreviewer_repository
reviews_group_users
reviews_review_comments
reviews_review_screenshot_comments
reviews_reviewrequest_changedescs
reviews_reviewrequest_inactive_screenshots
reviews_reviewrequest_screenshots
reviews_reviewrequest_target_groups
reviews_reviewrequest_target_people
reviews_reviewrequestdraft_inactive_screenshots
reviews_reviewrequestdraft_screenshots
reviews_reviewrequestdraft_target_groups
reviews_reviewrequestdraft_target_people

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/ziyEeocC-9UJ.
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to django-developers+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Anssi Kääriäinen | 28 Nov 21:56 2012
Picon

Re: Performance problems due to incorrect many-many primary key on many tables

On 27 marras, 21:11, Trey Raymond <aradapi...@...> wrote:
> Hi folks,
> I'm a DB engineer working for Yahoo, and we have a new product using django
> that I'm onboarding.  We see a variety of easily fixed issues, but one
> major one - there are 21 many-many tables here, yet they have auto
> increment primary keys.  This of course is very slow on any platform, but
> particularly can't use clustering on innodb (you can't, obviously, use
> myisam on a production system) and the performance of queries on these
> tables is severely degraded.  I can't onboard them properly until we fix
> these, but they claim the code throws errors when I do - I can't figure out
> what could possibly reference the field, but something does.  If you have
> any suggestions on an easy way (they have limited dev resources, and us
> DBE's can't work on code directly) to remove these references, a setting or
> a very simple change, I'd appreciate it...also consider this a bug report
> for future versions.

The problem is that m2m relations are using Models in the underlying
implementation, and every Django model must have a single field
primary key. I have a feeling that any hack allowing the removal of
the redundant PK from m2m relations only is going to look really ugly.
Although I am more than happy if somebody proves me wrong...

I do agree that the "id" primary key in m2m relations isn't necessary
and is bad for performance reasons for example. And that we should fix
this.

The best way forward might be to make the ORM work with composite
primary keys. But we could still keep this non-public. There has been
some work to support composite primary keys in Django. I haven't
followed the work closely, but if I understand the situation correctly
one of the biggest problems is that making all of Django composite PK
friendly in one go is somewhat daunting. Actually, some parts of
Django do not work too nicely with non-integer or modifiable primary
keys.

Of course after the ORM supports composite PKs then the next step
would be to make rest of Django support composite PKs, too, and have a
public API for composite PK models.

All in all +1 to fixing this, but I am not sure what is the best way
to achieve this. Ideas welcome...

 - Anssi

--

-- 
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers@...
To unsubscribe from this group, send email to django-developers+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

James Pic | 7 Dec 10:51 2012
Picon

Re: Performance problems due to incorrect many-many primary key on many tables

I'm not sbure but I think you can make an external app that overrides syncdb and creates optimised many to many tables before handing the job django's syncdb. If the tables exist, django won't create them.



On Wed, Nov 28, 2012 at 9:56 PM, Anssi Kääriäinen <anssi.kaariainen-k5wbR+HVKhg@public.gmane.org> wrote:
On 27 marras, 21:11, Trey Raymond <aradapi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi folks,
> I'm a DB engineer working for Yahoo, and we have a new product using django
> that I'm onboarding.  We see a variety of easily fixed issues, but one
> major one - there are 21 many-many tables here, yet they have auto
> increment primary keys.  This of course is very slow on any platform, but
> particularly can't use clustering on innodb (you can't, obviously, use
> myisam on a production system) and the performance of queries on these
> tables is severely degraded.  I can't onboard them properly until we fix
> these, but they claim the code throws errors when I do - I can't figure out
> what could possibly reference the field, but something does.  If you have
> any suggestions on an easy way (they have limited dev resources, and us
> DBE's can't work on code directly) to remove these references, a setting or
> a very simple change, I'd appreciate it...also consider this a bug report
> for future versions.

The problem is that m2m relations are using Models in the underlying
implementation, and every Django model must have a single field
primary key. I have a feeling that any hack allowing the removal of
the redundant PK from m2m relations only is going to look really ugly.
Although I am more than happy if somebody proves me wrong...

I do agree that the "id" primary key in m2m relations isn't necessary
and is bad for performance reasons for example. And that we should fix
this.

The best way forward might be to make the ORM work with composite
primary keys. But we could still keep this non-public. There has been
some work to support composite primary keys in Django. I haven't
followed the work closely, but if I understand the situation correctly
one of the biggest problems is that making all of Django composite PK
friendly in one go is somewhat daunting. Actually, some parts of
Django do not work too nicely with non-integer or modifiable primary
keys.

Of course after the ORM supports composite PKs then the next step
would be to make rest of Django support composite PKs, too, and have a
public API for composite PK models.

All in all +1 to fixing this, but I am not sure what is the best way
to achieve this. Ideas welcome...

 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To unsubscribe from this group, send email to django-developers+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.




--
http://yourlabs.org
Customer is king - Le client est roi - El cliente es rey.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to django-developers+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
James Pic | 7 Dec 10:53 2012
Picon

Re: Performance problems due to incorrect many-many primary key on many tables

This might also work: https://github.com/simone/django-compositekey#customize-the-manytomany-intermediate-modeltable



On Fri, Dec 7, 2012 at 10:51 AM, James Pic <jamespic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
I'm not sbure but I think you can make an external app that overrides syncdb and creates optimised many to many tables before handing the job django's syncdb. If the tables exist, django won't create them.


On Wed, Nov 28, 2012 at 9:56 PM, Anssi Kääriäinen <anssi.kaariainen-k5wbR+HVKhg@public.gmane.org> wrote:
On 27 marras, 21:11, Trey Raymond <aradapi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi folks,
> I'm a DB engineer working for Yahoo, and we have a new product using django
> that I'm onboarding.  We see a variety of easily fixed issues, but one
> major one - there are 21 many-many tables here, yet they have auto
> increment primary keys.  This of course is very slow on any platform, but
> particularly can't use clustering on innodb (you can't, obviously, use
> myisam on a production system) and the performance of queries on these
> tables is severely degraded.  I can't onboard them properly until we fix
> these, but they claim the code throws errors when I do - I can't figure out
> what could possibly reference the field, but something does.  If you have
> any suggestions on an easy way (they have limited dev resources, and us
> DBE's can't work on code directly) to remove these references, a setting or
> a very simple change, I'd appreciate it...also consider this a bug report
> for future versions.

The problem is that m2m relations are using Models in the underlying
implementation, and every Django model must have a single field
primary key. I have a feeling that any hack allowing the removal of
the redundant PK from m2m relations only is going to look really ugly.
Although I am more than happy if somebody proves me wrong...

I do agree that the "id" primary key in m2m relations isn't necessary
and is bad for performance reasons for example. And that we should fix
this.

The best way forward might be to make the ORM work with composite
primary keys. But we could still keep this non-public. There has been
some work to support composite primary keys in Django. I haven't
followed the work closely, but if I understand the situation correctly
one of the biggest problems is that making all of Django composite PK
friendly in one go is somewhat daunting. Actually, some parts of
Django do not work too nicely with non-integer or modifiable primary
keys.

Of course after the ORM supports composite PKs then the next step
would be to make rest of Django support composite PKs, too, and have a
public API for composite PK models.

All in all +1 to fixing this, but I am not sure what is the best way
to achieve this. Ideas welcome...

 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers <at> googlegroups.com.
To unsubscribe from this group, send email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.




--
http://yourlabs.org
Customer is king - Le client est roi - El cliente es rey.



--
http://yourlabs.org
Customer is king - Le client est roi - El cliente es rey.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to django-developers+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Gmane