How can the Stack Overflow data in XML be converted into a MySQL dump?
It will be brilliant to have it for mysqlimport
.
How can the Stack Overflow data in XML be converted into a MySQL dump?
It will be brilliant to have it for mysqlimport
.
If I'd use PHP for the import, I wouldn't use preg_match_all but XMLReader.
The XML files all have a similar structure. Each element <row> stands for one record and the attributes correspond to a field of that record. Each attribute is present in all records. Now all you have to do is to create tables with field names that are exactly like the attribute names in the XML files and write a simple script that puts the value of an attribute in the table field with the same name.
First attempt, without error/schema handling, almost untested:
<?php
$pdo = new PDO('mysql:host=???;dbname=???', '???', '???');
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// I'm using a Gentoo-patched version of PHP -> default connection charset=utf8
// You might have to take care of that.
foreach(array('users','posts','comments','votes', 'badges') as $source) {
echo $source, "\n";
$reader = new XMLReader;
$reader->open($source.'.xml') or die('!open '.$f);
$bFirst = true;
while ($reader->read()) {
if ( XMLREADER::ELEMENT===$reader->nodeType && 'row'===$reader->localName) {
if ($bFirst) {
$bFirst = false;
$params = array();
$stmt = createStmt($pdo, $reader, $source, $params);
}
$reader->moveToFirstAttribute();
do {
$params[$reader->name] = $reader->value;
} while ( $reader->moveToNextAttribute() );
$stmt->execute();
}
}
}
function createStmt($pdo, $reader, $table, &$bindArr) {
$fields=array();
$reader->moveToFirstAttribute();
do {
$fields[] = $reader->name;
} while ( $reader->moveToNextAttribute() );
$query = sprintf('INSERT INTO %s (%s) VALUES(:%s)', $table, join(',',$fields), join(',:',$fields));
echo 'query: ', $query, "\n";
$stmt = $pdo->prepare($query);
foreach($fields as $f) {
$bindArr[$f] = '';
$stmt->bindParam(':'.$f, $bindArr[$f]);
}
$reader->moveToElement();
return $stmt;
}
With the MySQL Connector/Net and System.Xml.XmlReader
it should be possible to do almost exactly the same thing with C#/.NET.
I wouldn't put too much effort in error handling, schema testing, "but what if", etc. here. If you need something beyond "just fiddling with the data" there are other tools; you could even use Hibernate for that ;-)
votes
table took the longest (8 minutes), followed by posts
(7 minutes)
I made a program to create a SQL file to load SO data into a PostgreSQL database. Reading it, it seems there are few PostgreSQL-specific things in it so you may try to port it.
Just took a look at the data.
It wouldn't be too hard to do a pattern matching preg_match_all
using PHP to get all the data and then print it out into SQL statements. (if someone hasn't already done it)