How to delete characters in a string according to a second string?

Question

Consider these two strings:

string1 <- "GCTCCC...CTCCATGAAGTA...CTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"
string_reference <- "GCTCCC...CTCCATGAAGTATTTCTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"

How do I easily remove the dots in "string1", but only those dots that are in the same position in "string_reference"?

Expected output:

string1 = "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

simple loop stepping through a character at a time...
– Rob
Commented Mar 24, 2014 at 22:48 — Rob, Commented Mar 24, 2014 at 22:48

Simon O'Hanlon · Accepted Answer · 2014-03-25 08:23:43Z

7

I'd just use R's truly vectorised subsetting and logical comparison methods...

# Split the strings
x <- strsplit( c( string1 , string_reference ) , "" )
# Compare and remove dots from string1 when dots also appear in the reference string at the same position
paste( x[[1]][ ! (x[[2]]== "." & x[[1]] == ".") ] , collapse = "" )
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

edited Mar 25, 2014 at 8:23

answered Mar 25, 2014 at 0:07

Simon O'Hanlon

60.1k15 gold badges145 silver badges188 bronze badges

2

Simon, I think the user wants to remove the dots that appear in the same position (hence remove the one at 39, and the first set of dots as well). That said, I wouldn't bet my life on it...
– BrodieG
Commented Mar 25, 2014 at 0:13
But +1 for the simpler use of subsetting.
– BrodieG
Commented Mar 25, 2014 at 0:15
@BrodieG of course! And actually, that is exactly what my code does, I just posted the result of an old expression up there not what my command actually did. Cheers!
– Simon O'Hanlon
Commented Mar 25, 2014 at 8:23

Add a comment |

BrodieG · Accepted Answer · 2014-03-24 22:59:43Z

6

Similar to Robert's, but the "vectorized" version:

s1 <- unlist(strsplit(string1, ""))
s2 <- unlist(strsplit(string_reference, ""))
paste0(Filter(Negate(is.na), ifelse(s1 == s2 & s1 == ".", NA, s1)), collapse="")
# [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

I quote "vectorized" because the vectorization is happening on the characters of your string vectors. This assumes there is only one element in your string vectors. If you had multiple elements in your string vectors you would have to loop through the results of strsplit.

answered Mar 24, 2014 at 22:59

BrodieG

52.8k9 gold badges99 silver badges148 bronze badges

Great! paste0 is redundant since collapse = "".
– Robert Krzyzanowski
Commented Mar 25, 2014 at 3:14
@RobertKrzyzanowski, true, although it's not because collapse is "", rather, it's because there is only one vector.
– BrodieG
Commented Mar 25, 2014 at 12:51

Add a comment |

thelatemail · Accepted Answer · 2014-03-25 01:36:00Z

5

Using intersect to find the overlapping .'s

cutpos <- do.call(intersect, 
        sapply(list(string_reference,string1), gregexpr, pattern=".", fixed=TRUE)
          )
paste(strsplit(string1,"",fixed=TRUE)[[1]][-cutpos],collapse="")
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

A small variation of the above (courtesy of @Arun):

attr(cutpos, 'match.length') <- rep(1L, length(cutpos))
attr(cutpos, 'useBytes') <- TRUE

do.call(paste0, c(regmatches(string1, list(cutpos), invert=TRUE), collapse=""))
## [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

edited Mar 25, 2014 at 1:36

answered Mar 24, 2014 at 23:10

thelatemail

94.3k12 gold badges139 silver badges197 bronze badges

Add a comment |

Robert Krzyzanowski · Accepted Answer · 2014-03-24 22:55:12Z

1

Use:

string1v <- strsplit(string1, "")[[1]]
string_referencev <- strsplit(string_reference, "")[[1]]
stopifnot(length(string1v) == length(string_referencev))
finalstring <- paste(vapply(seq_along(string1v), function(ind) {
  if (string1v[ind] == '.' && string_referencev[ind] == '.') ''
  else string1v[ind] 
}, character(1)), collapse = "")

answered Mar 24, 2014 at 22:55

Robert Krzyzanowski

9,34430 silver badges24 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How to delete characters in a string according to a second string?

4 Answers 4

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Related