non-concurrent QuerySet.get_or_create() on Postgresql

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

non-concurrent QuerySet.get_or_create() on Postgresql

Дилян Палаузов
Hello,

currently get_or_create(params) is implemented (imprecisely) this way:

try:
   self.get(params)
except DoesNotExist:
   self._create_object_from_params(params)

This creates concurrency problem, as the object might get created by
another thread, after get(params) threw an exception and before
_create_object_from_params has started.

Luckily Postgresql lets combine the above statements into a single
structured query.  This should be a performance gain as for the DB is
faster to execute one than a two queries.

Consider this:

CREATE TABLE t (
   id SERIAL PRIMARY KEY,
   name VARCHAR(10) NOT NULL UNIQUE,
   comment VARCHAR(10));

And here comes the magic:

WITH
   to_be_inserted AS (SELECT 'nameD' as "name", 'comment13' as "comment"),
   just_inserted AS (
          INSERT INTO t (name, comment) SELECT * FROM to_be_inserted
                         WHERE NOT EXISTS(
                      SELECT * FROM t WHERE t.name='nameD')
          RETURNING *)
SELECT *, FALSE as "created" FROM t WHERE t.name='nameD' UNION ALL
SELECT *, TRUE AS "created" FROM just_inserted;

where "to_be_inserted contains" the values for the new object ('default'
parameter of get_or_create) and 'nameB' is the criterion passed to get().

Regards
   Дилян

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/f82e01e6-0bd0-1a44-957d-1e1544593af6%40aegee.org.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: non-concurrent QuerySet.get_or_create() on Postgresql

Ran Benita
Have you drilled down to `self._create_object_from_params(params)`? It does handle this case, as follows:

try:
    with transaction.atomic(using=self.db):
        params = {k: v() if callable(v) else v for k, v in params.items()}
        obj = self.create(**params)
    return obj, True
except IntegrityError:
    exc_info = sys.exc_info()
    try:
        return self.get(**lookup), False
    except self.model.DoesNotExist:
        pass
    raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])

That said, I do think that doing it in one single atomic query has advantages (for one, the above is still not atomic, though the failure case in the end is much less likely). But a complicated postgres-specific query is perhaps not worth it. If you can think of a generic method, that works with READ COMMITTED, that would be interesting. Some conditions I would personally impose are:

1. Should not increment the primary key sequence in the `get` case. This is not critical for correctness, I just like to avoid these holes :)
2. The `get` case should remain fast. In the current code, `get_or_create` is the same as `get` when the row already exists. In many (most?) scenarios this is the 99% case.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CANeXs%3D1dV73pdKbzR1ppGvKZ4O_%2BO-L_Hf9ZXn4nHBqtwvD8xg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: non-concurrent QuerySet.get_or_create() on Postgresql

Dilyan Palauzov
Hello

The problem was, that get_or_create() was called within a transaction and at the end of the transaction the INSERT caused IntegrityError, as in the meantime another transaction has finished, that inserted the same row but with a different id.  I have wrongly concluded, that repeating get_or_create() would help.  By coincidence, after inserting a second get_or_create() I have never seen the problem again.

> That said, I do think that doing it in one single atomic query has advantages (for one, the above is still not atomic, though the failure case in the end is much less likely). But a complicated postgres-specific query is perhaps not worth it. If you can think of a generic method, that works with READ COMMITTED, that would be interesting. Some conditions I would personally impose are:
>
> 1. Should not increment the primary key sequence in the `get` case. This is not critical for correctness, I just like to avoid these holes :)
What I proposed does not increment the sequence.

> 2. The `get` case should remain fast. In the current code, `get_or_create` is the same as `get` when the row already exists. In many (most?) scenarios this is the 99% case.

I think my proposal is also  quite fast, assuming that WHERE NOT EXISTS(SELECT * FROM -already fetched tabled-) is fast, and UNION ALL with with two tables, having a total of one row, is also fast.

Regards
   Dilian

On 10/12/17 19:04, Ran Benita wrote:

> Have you drilled down to `self._create_object_from_params(params)`? It does handle this case, as follows:
>
> try:
>      with transaction.atomic(using=self.db):
>          params = {k: v() if callable(v) else v for k, v in params.items()}
>          obj = self.create(**params)
>      return obj, True
> except IntegrityError:
>      exc_info = sys.exc_info()
>      try:
>          return self.get(**lookup), False
>      except self.model.DoesNotExist:
>          pass
>      raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])
>
> That said, I do think that doing it in one single atomic query has advantages (for one, the above is still not atomic, though the failure case in the end is much less likely). But a complicated postgres-specific query is perhaps not worth it. If you can think of a generic method, that works with READ COMMITTED, that would be interesting. Some conditions I would personally impose are:
>
> 1. Should not increment the primary key sequence in the `get` case. This is not critical for correctness, I just like to avoid these holes :)
> 2. The `get` case should remain fast. In the current code, `get_or_create` is the same as `get` when the row already exists. In many (most?) scenarios this is the 99% case.
>

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bf29fda0-a6e5-2ac1-670e-772bbc2c9bd0%40aegee.org.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: non-concurrent QuerySet.get_or_create() on Postgresql

Ran Benita
The code tries to handle a scenario like the following (of course, the statements can be relatively ordered in different ways). Can you describe how the scenario which fails for you looks like? I am assuming you are using READ COMMITTED, and that the lookup fields are unique together.

THREAD1                                       THREAD2
                                              SELECT => not found
SELECT => not found                          
                                              BEGIN
                                              INSERT => success
                                              COMMIT
BEGIN
INSERT => integrity error
ROLLBACK
(enter except IntegrityError)
SELECT => ??? (should be success)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CANeXs%3D3BGYGNX-U1oURCZy5e4VYG%2BNjOdQcPCDjc6CHgAt0faw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.