Strictly, joining across databases is usually impossible. However, I'm
looking at mechanisms to achieve that functionality and I just wanted
to get a sanity check to see that what I'm doing makes sense or if
there is a better way. I have one database in Postgres and a legacy
database in MSSql. I have a view in MSSql that I'd like to join with a
subset of a table in Postgres (a single VARCHAR column, actually).
This is essentially what I'm doing:
1. Use Django's ORM to get and filter data from Postgres
2. Create a temporary table in MSSql using raw SQL
3. Insert records into temporary table (we're talking order of 1k
4. Use raw SQL to join the MSSql view against the temp table
Is there a better or smarter way to do what I'm doing?
While I'd like this to be fast, speed is not a big deal since this is
run periodically from a celery task. Mirroring is not an option. We're
talking on the order of 1B records in MSSql and 10M records in
Postgres and both of them change periodically and independently. An IN
clause with strings is not an option because MSSql has a limit on the
number of items in an IN clause (2300) and this number is sometimes
exceeded depending on the filter.