Sign up ×
Programming Puzzles & Code Golf Stack Exchange is a question and answer site for programming puzzle enthusiasts and code golfers. It's 100% free, no registration required.

I just discovered this website and was amazed. Somehow this challenge popped into my mind: Provided a very long list of words; Write or describe an algorithm which finds a regex to match them all.

Ex.: A very long list of Dates, model-numbers, CD-Codes, ISBNs et cetera.

I hope it's not too big or does not belong here. Please refer to downvote in that case but write a comment and I will delete this.

Edit: Sorry at my clumsy attempt. I fear there is not a very sharp goal. Of course .* is not what I look for :) Here are some specifications:

  • It has to match all words
  • It has to group characters and find out what kind of characters are in that group, e.g. identify that an american date always consits of one to four numbers, then a slash etc.
  • It does not need to be able to identify transverses etc
  • There is a huge list of not-to-match-examples available as well

Winner is the one who can write a code which does what I described for one of my examples (or a similar one). In case that no code is committed, the winner is the one who can convince me, that the described algorithm could work.

share|improve this question

closed as off-topic by Martin Büttner, marinus, chilemagic, NinjaBearMonkey, Doorknob Nov 21 '14 at 23:35

This question appears to be off-topic. The users who voted to close gave this specific reason:

  • "Questions without an objective primary winning criterion are off-topic, as they make it impossible to indisputably decide which entry should win." – Martin Büttner, marinus, chilemagic, NinjaBearMonkey, Doorknob
If this question can be reworded to fit the rules in the help center, please edit the question.

4  
something like .* ? –  tigrou Nov 21 '14 at 20:22
2  
What's the winning criterion? Shortest code? Shortest regex? Something else? Does the regex need to match only the words? –  marinus Nov 21 '14 at 20:22
2  
I'm afraid your spec still isn't very solid. I don't know what you mean by your third bullet point. How long is a "huge list" (which can btw easily satisfied for any finite definition of "huge", with the expression ^.{n}$, where n is the longest input string). And how should the grouping work? If my input is foo, bar, abc I can always form a pattern [fba][oab][orc], but that doesn't make a lot of sense. Or [abcfor]{3}. It's very unclear what a valid solution has to do. –  Martin Büttner Nov 21 '14 at 21:01
    
Your problem is "regex golf". Search this site for it, or google it. –  Sparr Nov 21 '14 at 21:16
1  

1 Answer 1

In Java 8:

public class Reg{

    public static void main(String... args){
        System.out.println(args[0].replaceAll(",", "|"));
    }

}

Matches only the strings inputed. Expects input to be of form:

goodWord1,goodWord2,goodWord3...(space)badWord1,badWord2,badWord3...

Edit: Edited for new requirements. Now accepts a list of bad words also. It also now works in Java 7.

share|improve this answer
    
Good quick answer. But not what I looked for. Sorry. I gave some more details. –  3244611user Nov 21 '14 at 20:43
    
@3244611user It matches only the words in the input sequence. It matches no other words. You don't even need a list of not-to-match examples. See next program for one that finds the shortest answer. –  TheNumberOne Nov 21 '14 at 21:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.