The following guidelines cover aspects of developing MySQL applications that might not be immediately obvious to developers coming from a Python background:
For security, do not hardcode the values needed to connect and
log into the database in your main script. Python has the
convention of a config.py
module, where you
can keep such values separate from the rest of your code.
Python scripts often build up and tear down large data
structures in memory, up to the limits of available RAM.
Because MySQL often deals with data sets that are many times
larger than available memory, techniques that optimize storage
space and disk I/O are especially important. For example, in
MySQL tables, you typically use numeric IDs rather than
string-based dictionary keys, so that the key values are
compact and have a predictable length. This is especially
important for columns that make up the
primary key for an
InnoDB
table, because those column values
are duplicated within each
secondary index.
Any application that accepts input must expect to handle bad data.
The bad data might be accidental, such as out-of-range values or misformatted strings. The application can use server-side checks such as unique constraints and NOT NULL constraints, to keep the bad data from ever reaching the database. On the client side, use techniques such as exception handlers to report any problems and take corrective action.
The bad data might also be deliberate, representing a
“SQL injection” attack. For example, input values
might contain quotation marks, semicolons,
%
and _
wildcard
characters and other characters significant in SQL statements.
Validate input values to make sure they have only the expected
characters. Escape any special characters that could change
the intended behavior when substituted into a SQL statement.
Never concatenate a user input value into a SQL statement
without doing validation and escaping first. Even when
accepting input generated by some other program, expect that
the other program could also be hacked and be sending you
incorrect or malicious data.
Because the result sets from SQL queries can be very large, use the appropriate method to retrieve items from the result set as you loop through them. fetchone() retrieves a single item, when you know the result set contains a single row. fetchall() retrieves all the items, when you know the result set contains a limited number of rows that can fit comfortably into memory. fetchmany() is the general-purpose method when you cannot predict the size of the result set: you keep calling it and looping through the returned items, until there are no more results to process.
Since Python already has convenient modules such as
pickle
and cPickle
to
read and write data structures on disk, the data that you
choose store in MySQL instead is likely to have special
characteristics:
Too large to all fit in memory at
one time. You use
SELECT
statements to query
only the precise items you need, and
aggregate
functions to perform calculations across multiple
items. You configure the
innodb_buffer_pool_size
option within the MySQL server to dedicate a certain
amount of RAM for caching query results.
Too complex to be represented by a single data structure. You divide the data between different SQL tables. You can recombine data from multiple tables by using a join query. You make sure that related data is kept in sync between different tables by setting up foreign key relationships.
Updated frequently, perhaps by
multiple users simultaneously. The updates
might only affect a small portion of the data, making it
wasteful to write the whole structure each time. You use
the SQL INSERT
,
UPDATE
, and
DELETE
statements to update
different items concurrently, writing only the changed
values to disk. You use
InnoDB
tables and
transactions to
keep write operations from conflicting with each other,
and to return consistent query results even as the
underlying data is being updated.
Building in MySQL best practices for performance can help your application to scale without requiring major rewrites and architectural changes. See Chapter 8, Optimization for best practices for MySQL performance. It offers guidelines and tips for SQL tuning, database design, and server configuration.
You can avoid reinventing the wheel by learning the MySQL SQL
statements for common operations: operators to use in queries,
techniques for bulk loading data, and so on. Some statements
and clauses are extensions to the basic ones defined by the
SQL standard. See
Section 13.2, “Data Manipulation Statements”,
Section 13.1, “Data Definition Statements”, and
Section 13.2.9, “SELECT
Syntax” for the main classes of statements.
Issuing SQL statements from Python typically involves declaring very long, possibly multi-line string literals. Because string literals within the SQL statements could be enclosed by single quotation, double quotation marks, or contain either of those characters, for simplicity you can use Python's triple-quoting mechanism to enclose the entire statement. For example:
'''It doesn't matter if this string contains 'single' or "double" quotes, as long as there aren't 3 in a row.'''
You can use either of the '
or
"
characters for triple-quoting multi-line
string literals.
Many of the secrets to a fast, scalable MySQL application
involve using the right syntax at the very start of your setup
procedure, in the CREATE TABLE
statements. For example, Oracle recommends the
ENGINE=INNODB
clause for most tables, and
makes it the default in MySQL 5.5 and up. Using
InnoDB
tables enables transactional
behavior that helps scalability of read-write workloads and
offers automatic crash
recovery. A follow-on recommendation is to declare a
numeric primary key
for each table, which offers the fastest way to look up values
and can act as a pointer to associated values in other tables
(a foreign key). Also
within the CREATE TABLE
statement, using the most compact column data types that meet
your application requirements helps performance and
scalability, as the database server moves large amounts of
data back and forth between memory and disk.
User Comments
Add your own comment.