I'm working on a Python script to migrate a MySQL database into a PostgreSQL database with a different Schema (different table structures, different datatypes and so on).
I'm a sysadmin and unfortunately I don't code very often. So I'm having some doubts about this initial programming phase.
I begin with the tables that are easy to be migrate (almost the same structure), but very soon I will have to transfer tables that need more operations to be converted for compatability.
My code actually looks like this:
#!/usr/bin/python
# Script Name: database-migration_msql-psql.py
# Description: Migrate mysql database a2
# into postgresql database a3.
# Created By: phphil.
# Date: 7 Oct 2015.
#
# ------------------------------
# Import standard libraries |
# ------------------------------
#
import os
import sys
import mysql.connector
import psycopg2
from pprint import pprint
import MySQLdb
# ------------------------------
# Import internal snippets |
# ------------------------------
#
from include.db_config import *
#from include.MySQLCursorDict import *
# ------------------------------
# Open database connections |
# ------------------------------
#
# Mysql connection
try:
cnx_msql = mysql.connector.connect( host=host_mysql, user=user_mysql, passwd=pswd_mysql, db=dbna_mysql )
except mysql.connector.Error as e:
print "MYSQL: Unable to connect!", e.msg
sys.exit(1)
# Postgresql connection
try:
cnx_psql = psycopg2.connect(conn_string_psql)
except psycopg2.Error as e:
print('PSQL: Unable to connect!\n{0}').format(e)
sys.exit(1)
# Cursors initializations
cur_msql = cnx_msql.cursor(dictionary=True)
cur_psql = cnx_psql.cursor()
# ------------------------------
# A2.right > A3.permission |
# ------------------------------
#
cur_msql.execute("SELECT resource_id, user_id, group_id, right_type_id, inheritance, mac FROM rights")
for row in cur_msql:
### check if owner_id is a user or a group
if row['user_id'] == 0:
row['user_id'] = row['group_id']
else:
pass
try:
cur_psql.execute("INSERT INTO permissions (resource_id, owner_id, level, inheritance, mac) \
VALUES (%(resource_id)s, %(user_id)s, %(right_type_id)s, %(inheritance)s, %(mac)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# -------------------------------
# A2.membership > A3.membership |
# -------------------------------
#
cur_msql.execute("SELECT group_id, user_id, update_date, update_user_id, status FROM memberships")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO memberships (group_id, user_id, update_date, update_member_id, approval_date, status) \
VALUES (%(group_id)s, %(user_id)s, %(update_date)s, %(update_user_id)s, %(update_date)s, %(status)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# -------------------------------
# A2.user_keys > A3.user_keys |
# -------------------------------
#
cur_msql.execute("SELECT resource_id, user_key FROM user_keys")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO user_keys (resource_id, user_key) VALUES (%(resource_id)s, %(user_key)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# -----------------------------------------------------
# A2.user_password_history > A3.user_password_history |
# -----------------------------------------------------
#
cur_msql.execute("SELECT resource_id, timestamp, password FROM user_password_history")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO user_password_history (resource_id, timestamp, password) \
VALUES (%(resource_id)s, %(timestamp)s, %(password)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# -------------------------------
# A2.log > A3.log |
# -------------------------------
# parent_id not used in A2, took by joining resources table. refered_to & detail fields too difficult to migrate, left empty in A3
#
cur_msql.execute("SELECT l.log_id, l.resource_id, l.community_id, r.parent_id, l.timestamp, l.user_id, l.action, l.type, l.subtype \
FROM log l, resources r \
WHERE l.resource_id=r.resource_id")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO log(log_id, resource_id, community_id, parent_id, timestamp, user_id, action, type, subtype, detail, refered_to) \
VALUES(%(log_id)s, %(resource_id)s, %(community_id)s, %(parent_id)s, %(timestamp)s, %(user_id)s, %(action)s, %(type)s, %(subtype)s, NULL, DEFAULT)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# -----------------------------------------------------
# A2.notification_settings > A3.notification_settings |
# -----------------------------------------------------
#
cur_msql.execute("SELECT resource_id, normal_medium, normal_delivery, normal_days, normal_time, high_medium, high_delivery, high_days, high_time, editable FROM notification_settings")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO notification_settings (resource_id, normal_channel, normal_delivery, normal_days, normal_time, high_channel, high_delivery, high_days, high_time, editable) \
VALUES (%(resource_id)s, %(normal medium)s, %(normal_delivery)s, %(normal_days)s, %(normal_time)s, %(high_medium)s, %(high_delivery)s, %(high_days)s, %(high_time)s, %(editable)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# -----------------------------------------------------
# A2.resource_notification > A3.resource_notification |
# -----------------------------------------------------
#
cur_msql.execute("SELECT resource_id, notification_to FROM resources_notifications")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO resources_notifications (resource_id, notification_to) VALUES (%(resource_id)s, %(notification_to)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
################ END OF SCRIPT ################
# ---------------------------------------------
# Finalizing stuff & closing db connections |
# ---------------------------------------------
#
## Closing cursors
cur_msql.close()
cur_psql.close()
## Committing
cnx_psql.commit()
## Closing database connections
cnx_msql.close()
cnx_psql.close()
As you will notice, in each section of the script the structure is almost the same:
- Select data from a table of the source database (mysql), the result is handled by a cursor with dictionary flag (a python dictionary).
- After this, the dictionary is iterated within a for loop where possible e.g. casting fields, or adapt the table structure (see section: A2.right > A3.permission).
- And still inside the for loop, each record is inserted in the destination database.
Questions/Doubts:
Do I need to create a class in order to abstract the redundant code? Or maybe it's better to just create a function? Can someone post a short example? I have no Idea how to proceed.
In both cases I see some problems on abstracting it because the redundant code is inside a loop where I will have to do different operations depending on what table I'm iterating.I used to open and close cursors at each operation(script section), then I decided to open both cursors at the beginning of the script, and use them until the end and close them. But now I've read this and I'm confused. What's better in your opinion? One cursor for each operation, or one cursor for the whole script?