Take the 2-minute tour ×
Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I know this has been asked here and here, but I have the same idea with a different possible implementation and I need some help.

Initially I had my blogstories table with this structure:

| Column    | Type        | Description                                    |
|-----------|-------------|------------------------------------------------|
| uid       | varchar(15) | 15 characters unique generated id              |
| title     | varchar(60) | story title                                    |
| content   | longtext    | story content                                  |
| author    | varchar(10) | id of the user that originally wrote the story |
| timestamp | int         | integer generated with microtime()             |

After I decided I wanted to implement some versioning system for every story on the blog, the first thing that came to my mind was creating a different table to hold edits; after that, I thought I could modify the existing table to hold versions instead of edits. This is the structure that came to my mind:

| Column        | Type          | Description                                       |
|------------   |-------------  |------------------------------------------------   |
| story_id      | varchar(15)   | 15 characters unique generated id                 |
| version_id    | varchar(5)    | 5 characters unique generated id                  |
| editor_id     | varchar(10)   | id of the user that commited                      |
| author_id     | varchar(10)   | id of the user that originally wrote the story    |
| timestamp     | int           | integer generated with microtime()                |
| title         | varchar(60)   | current story title                               |
| content       | longtext      | current story text                                |
| coverimg      | varchar(20)   | cover image name                                  |

The reasons why I came here:

  • The uid field of the initial table was UNIQUE in the table. Now, the story_id is not unique anymore. How should I deal with that? (I thought I could address story_id = x and then find the latest version, but that seems very resource consuming, so please give your advice)
  • author_id field value is repeating in each every row of the table. Where and how should I keep it?

Edit

The unique codes generation process is in the CreateUniqueCode function:

trait UIDFactory {
  public function CryptoRand(int $min, int $max): int {
    $range = $max - $min;
    if ($range < 1) return $min;
    $log = ceil(log($range, 2));
    $bytes = (int) ($log / 8) + 1;
    $bits = (int) $log + 1;
    $filter = (int) (1 << $bits) - 1;
    do {
        $rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
        $rnd = $rnd & $filter;
    } while ($rnd >= $range);
    return $min + $rnd;
  }
  public function CreateUID(int $length): string {
    $token = "";
    $codeAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    $codeAlphabet.= "abcdefghijklmnopqrstuvwxyz";
    $codeAlphabet.= "0123456789";
    $max = strlen($codeAlphabet) - 1;
    for ($i=0; $i < $length; $i++) {
        $token .= $codeAlphabet[$this->CryptoRand(0, $max)];
    }
    return $token;
  }
}

The code is written in Hack, and was originally written in PHP by @Scott in his answer.

The fields author_id and editor_id can be different, because there are users with enough permissions to edit anyone's stories.

share|improve this question

migrated from codereview.stackexchange.com Aug 25 at 12:12

This question came from our site for peer programmer code reviews.

3 Answers 3

As stated in comments, I would store the current or effective version of a BlogStory in its respective table and keep all of its previous versions (or past states) in a separate (but related) BlogStoryVersion table.

In this manner, you may find this post helpful since it presents a comparable method for a similar scenario.

Business rules

In accordance with my understanding of your specifications, the following assertions are especially relevant:

  • A User writes zero-one-or-many BlogStories.
  • A BlogStory holds zero-one-or-many BlogStoryVersions.
  • A User wrote zero-one-or-many BlogStoryVersions.

Expository data model

I have derived a sample IDEF1X logical data model from the aforementioned assertions in order to illustrate the method that I will describe below. This model is shown in Figure 1.

Figure 1 Blog Post History Sample Data Model

Proposed physical structure

From the above presented data model, I have created the following demo DDL statements:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- As one would expect, you are free to make use of 
-- your preferred (or required) naming conventions.    

CREATE TABLE UserProfile
(
    UserId          INT      NOT NULL,
    FirstName       CHAR(30) NOT NULL,
    LastName        CHAR(30) NOT NULL,
    BirthDate       DATETIME NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_UserProfile PRIMARY KEY (UserId),
    CONSTRAINT UIX_UserProfile_FirstName_LastName_and_BirthDate UNIQUE
    (
        FirstName,
        LastName,
        BirthDate
    )
);

CREATE TABLE BlogStory
(
    BlogStoryNumber INT      NOT NULL,
    Title           CHAR(60) NOT NULL,
    Content         TEXT     NOT NULL,
    CoverImageName  CHAR(30) NOT NULL,
    IsActive        BOOLEAN  NOT NULL,
    AuthorId        INT      NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_BlogStory             PRIMARY KEY (BlogStoryNumber),
    CONSTRAINT UIX_BlogStory_Title      UNIQUE      (Title),
    CONSTRAINT FK_BlogStory_UserProfile FOREIGN KEY (AuthorId)
        REFERENCES UserProfile (UserId)
);

CREATE TABLE BlogStoryVersion
(
    BlogStoryNumber INT      NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    Title           CHAR(60) NOT NULL,
    Content         TEXT     NOT NULL,
    CoverImageName  CHAR(30) NOT NULL,
    IsActive        BOOLEAN  NOT NULL,
    AuthorId        INT      NOT NULL,
    UpdatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_BlogStoryVersion             PRIMARY KEY (BlogStoryNumber, CreatedDateTime),
    CONSTRAINT FK_BlogStoryVersion_BlogStory   FOREIGN KEY (BlogStoryNumber)
        REFERENCES BlogStory (BlogStoryNumber),
    CONSTRAINT FK_BlogStoryVersion_UserProfile FOREIGN KEY (AuthorId)
        REFERENCES UserProfile (UserId)
)

Suggested method description

BlogStory

As you can see in the proposed physical structure, I have defined the BlogStory PRIMARY KEY (PK) data type as INT so, in this regard, you could establish an automatic process in order to store a numeric value in such column in every row insertion. If you do not mind leaving gaps occasionally in this set of values, then you can employ the AUTO_INCREMENT attribute which is commonly used in MySQL environments.

When storing all your individual BlogStory.CreatedDateTime occurrences, you can utilize the NOW() function, which returns the date and time values that are current in the database server at the specific insertion instant. To me, this practice is decidedly more suitable and less prone to errors than the use of external routines.

If, as discussed in comments, you want to avoid the posibility of storing BlogStory.Title duplicate values, you have to set up a UNIQUE INDEX for this column. Due to the fact that a given Title may be mantained over each or several blog story versions, then a UNIQUE INDEX should not be implemented in the BlogStoryVersion table.

I included the BlogStory.IsActive BOOLEAN column in case you need to provide soft or logical deletion functionality.

BlogStoryVersion

On the other hand, the BlogStoryVersion PK is composed of BlogStoryNumber and a column named CreatedDateTime which, of course, marks the exact instant in which a BlogStory instance was inserted.

BlogStoryVersion.BlogStoryNumber, besides being part of the PK, is also set as a FOREIGN KEY (FK) that references BlogStory.BlogStoryNumber, a fact that enforces referential integrity between these two tables. In this respect, implementing an automatic generation of a BlogStoryVersion.BlogStoryNumber is not necessary because, being defined as a FK, the values inserted in this column must be retrieved from the related BlogStory.BlogStoryNumber.

The column BlogStoryVersion.UpdatedDateTime should store, as expected, the point in time when a BlogStory was modified and, as a consequence, added to the BlogStoryVersion table, therefore, you can also use the NOW() function in this situation.

The interval comprehended between BlogStoryVersion.CreatedDateTime and BlogStoryVersion.UpdatedDateTime expresses the period during which a BlogStory row was current or effective.

Considerations for a Version column

It can be useful to think of BlogStoryVersion.CreatedDateTime as the column that stores the value that represents the particular version of a BlogStory. I deem this much more beneficial than a VersionId or VersionCode, since it is user-friendlier in the sense that, as you know, people tend to be more familiar with time concepts. For instance, your blog authors or readers could refer to a story version in a way similar to the following:

  • “I want to see the specific version of blog story number 1750 that was created on 26 August 2015 at 9:30”.

Author and Editor

With this approach, you can identify who is the original AuthorId of a given blog story SELECTing the earliest version of a certain BlogStoryId FROM the BlogStoryVersion table by means of applying the MIN() function to BlogStoryVersion.CreatedDateTime.

In this way, each of the BlogStoryVersion.AuthorId values contained in all the later versions indicate, naturally, the authors of the respective version at hand, but one can also say that such values are, at the same time, denoting the role played by the involved users as editors of the original version of a BlogStory.

Database DATETIME columns format

About the DATETIME data type, yes, you are right, “MySQL retrieves and displays DATETIME values in 'YYYY-MM-DD HH:MM:SS' format”, but you can confidently store your pertaining data in this manner, and when you have to perform a query you can easily make use of the DATE and TIME functions in order to, among other things, show the concerning values in the appropriate format for your users. Or you could also carry out this kind of data formatting in your application(s) code, if necessary.

BlogStory UPDATE process implications

When a given BlogStory suffers an UPDATE, you must ensure that the corresponding values that were in effect until such UPDATE took place are inserted into the BlogStoryVersion table. In this sense, I would highly suggest implementing a single ACID Transaction so that these operations are treated as an indivisible Unit of Work.

Using a VersionId or VersionCode column

If you decide (due to business circumstances or personal preference) to use a BlogStory.VersionId or BlogStory.VersionCode column representing a particular blog story version, you have to ponder the following possibilities:

  • The VersionCode could be required to be UNIQUE in the whole BlogStory table and also in the BlogStoryVersion table. Therefore, you have to implement a carefully tested and totally reliable method in order to generate each Code.
  • Maybe the VersionCode values could be repeated in different BlogStory occurrences, but never duplicated along with the same BlogStoryNumber. E.g., you could have blog story number 3 - version 83o7c5c and, simultaneously, a blog story number 86 - version 83o7c5c and a blog story number 958 - version 83o7c5c.

The later possibility opens another alternative:

  • Keeping a VersionNumber for your BlogStories, so you could have: blog story number 23 - versions 1, 2, 3…; blog story number 650 - versions 1, 2, 3…; blog story number 2254 - versions 1, 2, 3…; etc.

Storing original and subsequent versions in a single table

If you choose to maintain all the blog story versions in the same table, you can still take advantage of many of the concepts detailed above, for example: a composite PK consisting of an INT column (BlogStoryNumber) and a DATETIME column (CreatedDateTime), the usage of server functions in order to optimize the pertinent processes, and the Author and Editor roles.

Due to the fact that by proceeding with this approach a BlogStoryNumber will be duplicated as soon as newer versions are added, an option that is very alike to those mentioned in the previous section and that you could analyze is establishing a BlogStory PK composed of the columns BlogStoryNumber and VersionCode, this way you would be able to uniquely identify each version of a blog story. And you can also try with a combination of BlogStoryNumber with a VersionNumber.

share|improve this answer
1  
This seems a very promising solution! I will check it out tomorrow running some queries and stuff like that. But until then, your solution is most likely to be the answer. Thank you very much for writing that much and that good! –  Victor Aug 26 at 21:22
    
@Victor You are welcome. –  MDCCL Sep 1 at 15:20

One option is to use Version Normal Form (vnf). The advantages include:

  • The current data and all past data reside in the same table.
  • The same query is used to retrieve current data or data that was current as of any particular date.
  • Foreign key references to versioned data work the same as for unversioned data.

An additional benefit in your case, as versioned data is uniquely identified by making the effective date (the date the change was made) part of the key, a separate version_id field is not required.

Here is an explanation for a very similar type of entity.

More details can be found in a slide presentation here and a not-quite-completed document here

share|improve this answer

Your relation

(story_id, version_id, editor_id, author_id, timestamp, title, content , coverimg)

is not in 3rd normal form. For every version of your story the author_id is the same. So you need two relations to overcome this

(story_id, author_id)
(story_id, version_id, editor_id, timestamp, title, content , coverimg)

The key of the first relation is story_id, the key of the second relation is the combined key (story_id, version_id). If you don't like combined key then you can use only version_id as key

share|improve this answer
2  
This doesn't seem to solve my problem, it just emphasizes them –  Victor Aug 25 at 18:30
    
So it doesn't even answer the query 'author_id field value is repeating in each every row of the table. Where and how should I keep it'? –  miracle173 Aug 25 at 20:44
1  
I do not really understand what your answer states. It might be because I am not a native English speaker, so could you try to explain it in more and simple words, please? –  Victor Aug 25 at 21:13
    
It means that you should avoid the repetition of the author_id number (if the story_id is equal for two rows, their author_id is equal, too) and split your table in the two tables as described in my post. So you can avoid the repetition of author_id. –  miracle173 Aug 26 at 15:27

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.