Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am trying to parse a FDF file using PHP, and regex. But I just cant get my head around regex. I am stuck parsing the file to generate a array.

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V ([email protected])
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Current function (source:http://php.net/manual/en/ref.fdf.php)

function parse2($file) {
 if (!preg_match_all("/<<\s*\/V([^>]*)>>/x", $file,$out,PREG_SET_ORDER))
         return;
 for ($i=0;$i<count($out);$i++) {
         $pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
         $thing = $out[$i][1];
         if (eregi($pattern,$out[$i][0],$regs)) {
                 $key = $regs[2];
                 $val = $regs[1];
                 $key = preg_replace("/^\s*\(/","",$key);
                 $key = preg_replace("/\)$/","",$key);
                 $key = preg_replace("/\\\/","",$key);
                 $val = preg_replace("/^\s*\(/","",$val);
                 $val = preg_replace("/\)$/","",$val);
                 $matches[$key] = $val;
         }
 }
 return $matches;
}

Result:

Array
(
    [field_email)
    ] => [email protected])

    [field_name)
    ] => John)

    [field_reference)
    ] => )

)

Why does it conclude the ) and new line? I know this problem is trivial for someone that understands regex expressions. So help would be appreciated.

share|improve this question
add comment

1 Answer

up vote 1 down vote accepted

Description

Your initial expression simply finds the entire block of text which represents each key and value set. Then in your clean up section, you're looking for a close paran which is followed immediately by a end of string \)$ but I'm sure there are additional characters between the close paran and the end of the string.

Instead I'd handle all this in one operation. This expression will:

  • find the field value
    • trim the surrounding parens off
    • and place into capture group 1
  • find the name of the value and place into capture group 2
    • trim the field_ substring off
    • trim the surrounding parens off
    • and place into capture group 2
  • requires the options: case insensitive, and multi-line

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)

enter image description here

Example

Live Demo

Sample Text

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V ([email protected])
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Matches

[0][0] = /V ([email protected])
/T (field_email)
[0][1] = [email protected]
[0][2] = email

[1][0] = /V (John)
/T (field_name)
[1][1] = John
[1][2] = name

[2][0] = /V ()
/T (field_reference)
[2][1] = 
[2][2] = reference



Or

If you wanted retain the field_ substring, then you can simply remove that from the expression like so:

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(([^)]*)\)

enter image description here

share|improve this answer
    
adding the \ims flags, and the regex works perfectly in php (preg_match_all("/^\/V\s(([^)]*))[\r\n]*^\/T\s(field_([^)]*))/ims", $file,$out,PREG_SET_ORDER); Also in the mean time found debuggex.com great for debugging –  user2413433 Aug 10 '13 at 15:05
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.