Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I need to get some data from PHP(Wordpress) config files from my Python script. How I can parse config data? For example, how I can get $wp_version value? Config example:

/**
 * The WordPress version string
 *
 * @global string $wp_version
 */
$wp_version = '3.5.1';

/**
 * Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.
 *
 * @global int $wp_db_version
 */
$wp_db_version = 22441;

/**
 * Holds the TinyMCE version
 *
 * @global string $tinymce_version
 */
$tinymce_version = '358-23224';

/**
 * Holds the required PHP version
 *
 * @global string $required_php_version
 */
$required_php_version = '5.2.4';

/**
 * Holds the required MySQL version
 *
 * @global string $required_mysql_version
 */
$required_mysql_version = '5.0';

$wp_local_package = 'en_EN';
share|improve this question
 
If you have access to PHP, it maybe more robust to use PHP to tokenise the source file and output the structure in a more Python-friendly format - using token_get_all for example. –  Anthony Sterling Jun 2 at 10:37
 
try github.com/ramen/phply –  thg435 Jun 2 at 10:39

1 Answer

up vote 4 down vote accepted

You know that a simple variable in PHP is like $foo = 'bar';, let's create a regex that does not take in account something like $_GET or $foo['bar']:

  1. Start with $, note that we need to escape it:
    \$
  2. The first character after $ can't be a number and has to be a letter or underscore:
    \$[a-z]
  3. Then there may be a letter or digits or underscore after it:
    \$[a-z]\w*
  4. Let's put the parenthesis:
    \$([a-z]\w*)
  5. Now then there should be the "equal sign", but to make it more compatible, let's make the spaces optional:
    \$([a-z]\w*)\s*=\s*
  6. After this there should be a value and it ends with a ;:
    \$([a-z]\w*)\s*=\s*(.*?);$
  7. We will use the m modifier which make ^$ match start and end of line respectively.
  8. You can then use a trimming function to get ride of the single and double quotes.

Online demo

Note 1: This regex will fail at nested variables $fail = 'en_EN'; $fail2 = 'en_EN';
Note 2: Don't forget to use the i modifier to make it case insensitive.

share|improve this answer
1  
Your regex is working pretty good! But I add quotes and remove new line symbol from end: \$([a-z]\w*)\s*=\s*\'(.*?)\'; –  inlanger Jun 2 at 10:27

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.