3

I often manually pull in production data into my test database so I can test new code on realistic data, as well as test upgrade scenarios or repro data specific bugs. To do this, I've setup a VIEW for each production table in my test database. These views look something like this:

CREATE VIEW ProdLink.Users AS
   select * from dblink(
      'hostaddr=123.123.123.123 dbname=ProductionDB user=ROUser password=secret',
      'select * from users') as t1(userid uuid, email varchar(50), alias varchar(50), fullname varchar(50), password varchar(100));

Now, on my production database I can run:

SELECT * FROM ProdLink.Users;

And see all users on my production database. I can then do things like:

INSERT INTO Users SELECT * FROM ProdLink.Users L WHERE NOT EXISTS (select 1 from Users where Users.UserId = L.UserId);

Allowing me to pull in every user from production that doesn't already exist in test.

I have about 30 of these views to proxy the production data, however I find it somewhat hacky to have to hardcode in the production database connection info into each view.

My Question: Is there a good way to avoid hardcoding, or at least duplicating this connection info on each view? Can I use database level variables, environment variables, or anything else instead?

4
  • It's generally bad practice to link your production environment to your dev environment in this fashion. Would restoring a backup be an option? If not, possibly and etl package of some sort to refresh some pertinent data?
    – sam yi
    Commented May 15, 2012 at 20:21
  • @samyi - Restoring a backup wouldn't work too well, as it would wipe out all the schema changes in the development DB, or require me to restore then run some sort of upgrade script to bring the schema and data back up to date (and some of the changes might be rather complex). I like the idea of being able to refresh a single table at a time when I need to quickly import some data from production. However, I'd be interested in hearing why it's considered bad practice, to see if that reasoning applies to my case or if it presents any gotchas to look out for. Commented May 15, 2012 at 20:30
  • It obviously depends on the type of system you're working with but... performance implications and potential for accidental ddl script would be the two main reasons.. ie. alter table / truncate table / etc. In a lot of places, developer are not allowed access to the production environments. I think it all just depends on the business risk tolerance.
    – sam yi
    Commented May 15, 2012 at 20:41
  • @samyi - Ah yea, most excellent points. I feel that logic is most applicable to a larger company with multiple developers and a more disciplined development process. I don't have enough data nor enough site traffic to impact performance much, and I'm the only one working on the project so I probably won't accidentally delete a bunch of production data. I do have pretty well defined user accounts though, such as the user I'm connecting with doesn't have access to change anything, only read tables. Commented May 15, 2012 at 20:48

1 Answer 1

1

To avoid duplication you can store the connection strings in a table:

CREATE VIEW ProdLink.Users AS
    select * from dblink(
        (select conn_string from conn_string where conn = 'that_one'),
        'select * from users'
    ) as t1 (
        userid uuid, 
        email varchar(50), 
        alias varchar(50), 
        fullname varchar(50), 
        password varchar(100)
    );
1
  • Yea this appears to be the only way to do it, unless you want to call dblink_connect and setup a named connection before running your queries. I wish there were a way to have server-level variables. Commented May 21, 2012 at 0:10

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.