Schema Changes
- Fetch by order --- If the code is fetching column # as the way to get the data, a change in the schema will cause the column numbers to readjust. This will mess up the application and bad things will happen.
- Fetch by name --- If the code is fetching column by name such as
foo
, and another table in the query adds a column foo
, the way this is handled may cause problems when trying to get the right foo
column.
Either way, a schema change can cause problems with the extraction of the data.
Further consider if a column that was being used is removed from the table. The select * from ...
still works but errors out when trying to pull the data out of the result set. If the column is specified in the query, the query will error out instead giving a clear indiciation as to what and where the problem is.
Data overhead
Some columns can have a significant amount of data associated with them. Selecting back *
will pull all the data. Yep, here's that varchar(4096)
thats on 1000 rows that you've selected back giving you an additional possible 4 megabytes of data that you're not needing, but is sent across the wire anyways.
Related to the schema change, that varchar might not exist there when you first created the table, but now its there.
Failure to convey intent
When you select back *
and get 20 columns but only need 2 of them, you are not conveying the intent of the code. When looking at the query that does a select *
one doesn't know what the important parts of it are. Can I change the query to use this other plan instead to make it faster by not including these columns? I don't know because the intent of what the query returns isn't clear.
Lets look at some SQL fiddles that explore those schema changes a bit more.
First, the initial database: http://sqlfiddle.com/#!2/a67dd/1
DDL:
create table one (oneid int, data int, twoid int);
create table two (twoid int, other int);
insert into one values (1, 42, 2);
insert into two values (2, 43);
SQL:
select * from one join two on (one.twoid = two.twoid);
And the columns you get back are oneid=1
, data=42
, twoiid=2
, and other=43
.
Now, what happens if I add a column to table one? http://sqlfiddle.com/#!2/cd0b0/1
alter table one add column other text;
update one set other = 'foo';
And my results from the same query as before are oneid=1
, data=42
, twoiid=2
, and other=foo
.
A change in one of the tables disrupts the values of a select *
and suddenly your binding of 'other' to an int is going to throw an error and you don't know why.
If instead your SQL statement was
select
one.oneid, one.data, two.twoid, two.other
from one join two on (one.twoid = two.twoid);
The change to table one would not have disrupted your data. That query runs the same before the change and after the change.