Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am having a bit of trouble removing a part of a string inside a text file with php.

I have a big file and i need to remove part of a line of this file.

The thing is the line is not always the same. It keeps the format but the numbers change. Here is an example:

< /td >This is the line< /td >and this< /td >is < /td >the < /td >part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >

I would like to remove from the < /td > after the word this until the < /td > after Name.

I was wondering if there is anyway of makin php delete backwards from name until the X number occurence from < /td >, something like:

Delete from Name until the 4th appearance of < /td >

Hope someone can help me....

Both answers below do the trick for the text but they dont work for my real code. So here is part of the real code:

... < /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i8" onclick="dm.ItClk(this,\'\');cmn.href(\'indexall.php\',\'\');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" > < font class="MG_Icons" > &#xe 746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i8-tl" >Other_Name< /td >< td class="mn31BBArrowTD" > < /td >< /tr >< /table >< /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i3" onclick="dm.ItClk(this,\'\');cmn.href(\'index.php\',\'\');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" >< font class="MG_Icons" >&#xe 746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i3-tl" >Name< /td > class="mn31BBArrowTD"   < /td > /tr /table < /td >< /tr >< tr >< onmouseover="dm.v(th is,1);" onmouseout="dm.u(th is) ;" id="mnFE0B BC45_i5" oncli ck="dm.ItC lk(t his,\'\');cmn.h ref(\'indexd2.php\',\'\');" class...

This is only a little part of the code (is a Javascript Menu), there are spaces in all the tags (< tr >) to be able to see them....

The text i want to delete is:

< /td >< td class="mn31BBArrowTD" > < /td >< /tr >< /table >< /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i3" onclick="dm.ItClk(this,\'\');cmn.href(\'index.php\',\'\');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" >< font class="MG_Icons" >&#xe 746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i3-tl" >Name

Both mnFE0BBC45_i3-tl and mnFE0BBC45_i3 are not always the same, the number changes depending of the Name.

That is way i want to do: Delete all from Name to the 4th appearence of < /td >

share|improve this question
    
The code above is invalid HTML (<td> needs an opening and closing tag). Is this intentional? –  rwacarter Jan 20 at 10:22
    
Is that 'Name' word will be there in every text file. and also how long is the text –  zan Jan 20 at 10:23
    
it is intentional... it is only an example... in the real file each < /td > has its corresponding < td > –  Ebarriosjr Jan 20 at 10:24
    
The length of the text varies depending on the name of the variables that are in the middle... That is why i want to delete the text base in the occurence of the word < /td > –  Ebarriosjr Jan 20 at 10:25
    
so you know what is the Name –  zan Jan 20 at 10:26

2 Answers 2

up vote 1 down vote accepted

Misread the requirement first; here is a corrected version that looks for the appropriate matches before "Name".

Between the other occurences of "<\td>" I am only looking for alphanumeric characters and spaces. It may be necessary to add more to this character class, like dash or underline ([[:alnum:]\ ]+)

<?php
$txt = '< /td >This is the line< /td >and this< /td >is the part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >';

$replacement = preg_replace('/([[:alnum:]\ ]+<\s*\/td\s*>){2,2}Name<\s*\/td\s*>/', '', $txt);
echo "$replacement \n";
?>

Output:

< /td >This is the line< /td >and this< /td > after it keeps going < /td > a loong way < /td >

Edit:

Here is a little Perl script that does what you want:

#!/usr/bin/perl
#

use strict;
use warnings;

open(my $fh, "<", "input.txt")
                   or die "cannot open < input.txt: $!";
my $content = do { local $/ = <$fh> };
close($fh);

my $anchor = ">Name<";
my $position = 0;
# find occurences of anchor in the text
while ( $position = index($content, $anchor, $position) ) {
    if ($position == -1) {
        last;
    }
    print "anchor $anchor is at $position \n";
    # go backwards to the starttag of the anchor (has to be a td element)
    my $starttag_position = rindex($content, "< td", $position);
    print "starttag of anchor is at $starttag_position \n";
    my $start = $starttag_position;
    # look backwards to closing tds
    for (my $i = 0; $i < 4; $i++) {
        $start = rindex($content, "< /td >", $start - 1);
        if ($start == -1) {
            die("less than 3 tds found before $anchor");
        }
    }
    print "first td is at $start \n";
    # delete the text in between
    substr($content, $start, $starttag_position - $start, "");
}

open(my $fout, ">", "input.new")
                   or die "cannot open > input.new: $!";
print $fout $content;
close $fout;
share|improve this answer
    
Can you maybe try with the new code i wrote in the question? I am not able to make it work.... –  Ebarriosjr Jan 21 at 8:33
    
Okay, that's a different story. Could you post more examples of what you want to delete? Otherwise there might be many more such trials. –  nlu Jan 21 at 10:04
    
Can we use the class name "mn31BBArrowTD" somehow? –  nlu Jan 21 at 10:17
    
This is the command i use to create the part of the line: < td class="mn31BBArrowTD" >&nbsp;< /td >< /tr >< /table >< /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i'.$num.'" onclick="dm.ItClk(this,'.$e.''.$e.');cmn.href('.$e.'index'.$name.'.php'.$e.','.$‌​e.''.$e.');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" >< font class="MG_Icons" >&#xe746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i'.$num.'-tl" >'.$name.'< /td > The variabl $name is known to me but the $num not –  Ebarriosjr Jan 21 at 10:29
    
the variable $e is only a "\". –  Ebarriosjr Jan 21 at 10:32

Try this:

Algo: 1) first postion of name; 2) find postion of 3rd td from last 3) then truncate or make substring from that two postion.

$text_string= '< /td >This is the line< /td >and this< /td >is the part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >';
$textLength = strlen($text_string);
$first_pos= strpos($text_string,'Name');
$third_occurance = strrpos($text_string, '< /td >', $first_pos- strlen($text_string) - 3);
$result = substr_replace($text_string, ' ', $third_occurance /2, $textLength-$third_occurance );
var_DUMP($result);

Output:

string(78) "< /td >This is the line< /td >and this keeps going < /td > a loong way < /td >"
share|improve this answer
    
Can you maybe try with the new code i wrote in the question? I am not able to make it work.... –  Ebarriosjr Jan 21 at 8:33

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.