Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

I've got 8 xml files and need to do some work on these records in mysql. What is a good way to get these IDs into mysql?

<?xml version="1.0" encoding="utf-8"?>
<Records>
  <Record>
    <RecordMain>
      <Provider/>
      <ProviderID>M14723</ProviderID>
      ...

I'm kind of thinking it will be some slick grepping into a text file that I can use to create insert statements into a new, simple table that only contains these xml ids. Unfortunately, I'm not that handy with the basic unix tools yet...

thanks in advance

share|improve this question
2  
MySQL has a LOAD XML command: dev.mysql.com/doc/refman/5.5/en/load-xml.html. Also shown here: stackoverflow.com/questions/13833568/…, and again here: databasejournal.com/features/mysql/… –  slm Aug 21 '14 at 13:45

1 Answer 1

grep is not the right tool for the job for loading XML. XML is a data structure that is tag oriented, so anything line oriented (like grep) will only work for a subset. Sometimes you get away with that, but you always create code that's brittle and prone to exploding messily, when someone 'upstream' generates perfectly valid XML which doesn't work because you haven't followed the spec.

So aside from 'LOAD XML' which is referenced in the comments: http://stackoverflow.com/questions/13833568/automated-way-to-convert-xml-files-to-sql-database

I will offer a Perl solution to extract data from your XML.

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;

my @ID_list;

sub extract_provider_ID {
   my ( $twig, $provider ) = @_;
   push ( @ID_list, $provider -> text );
}

my $twig = XML::Twig->new(
    'twig_handlers' => { 'ProviderID' => \&extract_provider_ID },
);
$twig->parsefile( 'your_file.xml'  );

print join ("\n", @ID_list );

I'm sure you can figure out the appropriate insert now that @ID_list is populated with every 'ProviderID'. Note - it ignores hierarchy - any ProviderID element is found and returned. You can require that hierarchy if you wish, by for example:

'twig_handlers' => { 'Record/RecordMain/ProviderID' => \&extract_provider_ID },
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.