Regex getting numbers from string

Question

I have a string

$test = 'xyz45sd2-32d34-sd23-456562.abc.com'

The objective is to obtain $1=23 and $2=45 i.e equal number of digits from - followered by ..

I have tried the following:

$test1 =~ s/.*(\d+)-(\d+).*//;

But

$1 matches: 3

$2 matches: 456562

Will they always be two digits, or do you mean that it's a variable number of digits before and after the dash? If it's always two digits, you could just use s/.*(\d{2})-(\d{2}).*//
@GreatBigBore - The number of digits is variable, but i wanted to exact numbers on both sides, eg: if Left side of '-' has 5 numbers but right side of '-' has 3 numbers then i need to match 3 numbers
What do you expect in the case if the right side has lesser digits than the left side? For example in the string xyz45sd2-32d34-sd23-1d3f5b.abc.com? Would you expect 3 and 1?
@Samveen - yes. because the ending sequence from number 1 has the dot. So i want 3 and 1

Raghuram · Answer 1 · 2013-08-07 05:06:15Z

up vote 1 down vote

You can try this regex

if($test1 =~ m/(\S+)-(\S+)-([a-z]*)(\d+)-(\d\d)(\d+).*/)
{
    print $4,"|",$5;
}

I assume that u need only the first 2 didgits from 456562

answered 13 hours ago

Raghuram
1,5701414

add comment (requires an account with 50 reputation)

Alec · Answer 2 · 2013-08-07 05:17:27Z

up vote 1 down vote

perl -e '"xyz45sd2-32d34-sd23-456562.abc.com" =~ /(\d{2})-(\d{2})\d*(?=\.)/; print "$1\n$2\n"'

answered 12 hours ago

Alec
535112

add comment (requires an account with 50 reputation)

user2214806 · Answer 3 · 2013-08-07 05:44:04Z

This other entry confirms that regex does not count: How to match word where count of characters same

Building upon GreatBigBore's idea, if there's an upper bound to the count, then you could try the or operator |. This only matches your requirement to find a match; depending on the matched count the match will be in different bins. Only one case correctly places them in $1 and $2. (\d{3})-(\d{3})|(\d{2})-(\d{2})|(\d{1})-(\d{1})

However if you concatenate the result captures as $1$3$5 and $2$4$6, you will effectively get the 2 stings you were looking for.

Another idea is to operate iteratively, you could repeat your search on the string by increasing the number until the match fails. (\d{1})-(\d{1}) , (\d{2})-(\d{2}) ...

A binary search comes to mind making it an O{ln(N)}, N being the upper limit for the capture length.

Samveen · Answer 4 · 2013-08-07 11:46:34Z

Theoretical answer

Short answer:

What you're looking for is not possible using regular expressions.

Long Answer:

Regular expressions (as their name suggests) are a compact representation of Regular languages (Type-3 grammars in the Chomsky Heirarchy).

What you're looking for is not possible using regular expressions as you're trying to write out an expression that maintains some kind of count (some contextual information other than beginning and end). This kind of behavior cannot be modelled as a DFA(actually any Finite Automaton). The informal proof of whether a language is regular is that there exists a DFA that accepts that language. As this kind of contextual information cannot be modeled in a DFA, thus by contradiction, you cannot write a regular expression for your problem.

Practical Solution

my ($lhs,$rhs) = $test =~ /^[^-]+-[^-]+-([^-]+)-([^-.]+)\S+/;
# Alernatively and faster
my (undef,undef,$lhs,$rhs) = split /-/, $test;

# Rest is common, no matter how $lhs and $rhs is extracted.
my @left = reverse split //, $lhs;
my @right = split //, $rhs;

my $i;
for($i=0; exists($left[$i]) and exists($right[$i]) and $left[$i] =~ /\d/ and $right[$i] =~ /\d/ ; ++$i){}

--$i;
$lhs= join "", reverse @left[0..$i];
$rhs= join "", @right[0..$i];

print $lhs, "\t", $rhs, "\n";

asked	today
viewed	48 times
active	today

Regex getting numbers from string

4 Answers

Theoretical answer

Short answer:

Long Answer:

Practical Solution

Your Answer

Not the answer you're looking for? Browse other questions tagged regex perl sed or ask your own question.

Community Bulletin

Linked

Regex getting numbers from string

4 Answers

Theoretical answer

Short answer:

Long Answer:

Practical Solution

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged regex perl sed or ask your own question.

Community Bulletin

Linked

Related