Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

So I have, for example, a string such as this C3H20IO

What I wanna do is split this string so I get the following:

Array1 = {C,H,I,O}
Array2 = {3,20,1,1}

The 1 as the third element of the Array2 is indicative of the monoatomic nature of the I element. Same for O. That is actually the part I am struggling with.

This is a chemical equation, so I need to separate the elements according to their names and the amount of atoms there are etc.

Thanks,

share|improve this question
2  
How did you get the last two entries 1,1 in array2, your input string is C3H20IO doesn't have it. – Clement Amarnath 22 hours ago
    
The tricky part is that monatomic components are not proceeded by a number. I would love to see a slick Java 8 streams solution to this. – Tim Biegeleisen 22 hours ago
1  
@ClementAmarnath It seems 1,1 indicates monoatomic I and O – rock321987 22 hours ago
    
Exactly. The monoatomic numbers are supposed to get 1 while never actually having a number. I was thinking about using a for loop after toCharArray but not sure again about the monoatomic components. – Azazel 22 hours ago
1  
@Azazel See my answer using Map. Hope it will help you! – mmuzahid 21 hours ago

10 Answers 10

up vote 6 down vote accepted

You could try this approach:

String formula = "C3H20IO";

//insert "1" in atom-atom boundry 
formula = formula.replaceAll("(?<=[A-Z])(?=[A-Z])|(?<=[a-z])(?=[A-Z])|(?<=\\D)$", "1");

//split at letter-digit or digit-letter boundry
String regex = "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)";
String[] atoms = formula.split(regex);

Output:

atoms: [C, 3, H, 20, I, 1, O, 1]

Now all even even indices (0, 2, 4...) are atoms and odd ones are the associated number:

String[] a = new String[ atoms.length/2 ];
int[] n = new int[ atoms.length/2 ];

for(int i = 0 ; i < a.length ; i++) {
    a[i] = atoms[i*2];
    n[i] = Integer.parseInt(atoms[i*2+1]);
}

Output:

a: [C, H, I, O]
n: [3, 20, 1, 1]

share|improve this answer
    
This seems really good. I am gonna give it a try and see what happens. Thanks. Will update soon. – Azazel 21 hours ago
    
Works beautifully and for a beginner like it makes at least a little bit sense. :) – Azazel 21 hours ago
5  
Keep in mind that atoms may be abbreviated with multiple letters: Ag, Au, Mg etc. – AmazingDreams 19 hours ago
    
@AmazingDreams that's a good point, I fixed the code (assuming that the usual notation of having the second letter lowercase is respected), by inserting "1" only between letters if they are both uppercase – Maljam 6 hours ago

An approach without REGEX and data stored using ArrayList:

String s = "C3H20IO";

char Chem = '-';
String val = "";
boolean isFisrt = true;
List<Character> chemList = new ArrayList<Character>();
List<Integer> weightList = new ArrayList<Integer>();
for (char c : s.toCharArray()) {
    if (Character.isLetter(c)) {
        if (!isFisrt) {
            chemList.add(Chem);
            weightList.add(Integer.valueOf(val.equals("") ? "1" : val));
            val = "";
        }
        Chem = c;
    } else if (Character.isDigit(c)) {
        val += c;
    } 
    isFisrt = false;
}
chemList.add(Chem);
weightList.add(Integer.valueOf(val.equals("") ? "1" : val));

System.out.println(chemList);
System.out.println(weightList);

OUTPUT:

[C, H, I, O]
[3, 20, 1, 1]
share|improve this answer

You can use a regular expression to slide over your input using the Matcher.find() method.

Here a rough example of what it may look like:

    String input = "C3H20IO";

    List<String> array1 = new ArrayList<String>();
    List<Integer> array2 = new ArrayList<Integer>();

    Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
    Matcher matcher = pattern.matcher(input);               
    while(matcher.find()){
        array1.add(matcher.group(1));

        String atomAmount = matcher.group(2);
        int atomAmountInt = 1;
        if((atomAmount != null) && (!atomAmount.isEmpty())){
            atomAmountInt = Integer.valueOf(atomAmount);
        }
        array2.add(atomAmountInt);
    }

I know, the conversion from List to Array is missing, but it should give you an idea of how to approach your problem.

share|improve this answer

make (for loop) with size of input length and add following condition

if(i==number)
// add it to the number array

if(i==character)
//add it into character array
share|improve this answer
    
Ok. Understood. I actually have already done that. The main issue is with the monoatomic components. – Azazel 22 hours ago

I suggest splitting by uppercase letter using zero-width lookahead regex (to extract items like C12, O2, Si), then split each item into element and its numeric weight:

List<String> elements = new ArrayList<>();
List<Integer> weights = new ArrayList<>();

String[] items = "C6H12Si6OH".split("(?=[A-Z])");  // [C6, H12, Si6, O, H]
for (String item : items) {
    String[] pair = item.split("(?=[0-9])", 2);    // e.g. H12 => [H, 12], O => [O]
    elements.add(pair[0]);
    weights.add(pair.length > 1 ? Integer.parseInt(pair[1]) : 1);
}
System.out.println(elements);  // [C, H, Si, O, H]
System.out.println(weights);   // [6, 12, 6, 1, 1]
share|improve this answer

Is this good? (Not using split)

Regex Demo

String line = "C3H20ZnO2ABCD";
String pattern = "([A-Z][a-z]*)(((?=[A-Z][a-z]*|$))|\\d+)";

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(line);

while (m.find( )) {
     System.out.print(m.group(1));
     if (m.group(2).length() == 0) {
         System.out.println(" 1");
     } else {
         System.out.println(" " + m.group(2));
     }
  }

IDEONE DEMO

share|improve this answer

This works assuming each element starts with a capital letter, i.e. if you have "Fe" you don't represent it in String as "FE". Basically, you split the string on each capital letter then split each new string by letters and numbers, adding "1" if the new split contains no numbers.

        String s = "C3H20IO";
        List<String> letters = new ArrayList<>();
        List<String> numbers = new ArrayList<>();

        String[] arr = s.split("(?=\\p{Upper})");  // [C3, H20, I, O]
        for (String str : arr) {  //[C, 3]:[H, 20]:[I]:[O]
            String[] temp = str.split("(?=\\d)", 2);
            letters.add(temp[0]);
            if (temp.length == 1) {
                numbers.add("1");
            } else {
                numbers.add(temp[1]);
            }
        }
        System.out.println(Arrays.asList(letters)); //[[C, H, I, O]]
        System.out.println(Arrays.asList(numbers)); //[[3, 20, 1, 1]]
share|improve this answer
    
in .split(), you may use the second argument to limit number of results, so your second split can be simplified to temp = str.split("(?=\\d)", 2) – Sasha Salauyou 21 hours ago
    
Good point. Will edit accordingly. – anaxin 21 hours ago

You can use two patterns :

  • [0-9]
  • [a-zA-Z]

Split twice by each of them.

List<String> letters = Arrays.asList(test.split("[0-9]"));
List<String> numbers = Arrays.asList(test.split("[a-zA-Z]"))
            .stream()
            .filter(s -> !s.equals(""))
            .collect(Collectors.toList());

if(letters.size() != numbers.size()){
        numbers.add("1");
    }
share|improve this answer
    
What about the case of single atoms, e.g. in H_2_O there is no number after the oxygen atom. – Tim Biegeleisen 22 hours ago
    
That is actually the main issue for me. I figured out that using Split with those Regex patterns would work but that is the pain. – Azazel 22 hours ago
    
Only last atom can be without a number? – abyversin 22 hours ago

You can split the string by using a regular expression like (?<=\D)(?=\d). Try this :

String alphanum= "abcd1234";
String[] part = alphanum.split("(?<=\\D)(?=\\d)");
System.out.println(part[0]);
System.out.println(part[1]);

will output

abcd 1234

share|improve this answer
    
What about the monoatomic components? – Azazel 22 hours ago

I did this as following

ArrayList<Integer> integerCharacters = new ArrayList();
ArrayList<String> stringCharacters = new ArrayList<>();

String value = "C3H20IO"; //Your value 
String[] strSplitted = value.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"); //Split numeric and strings

for(int i=0; i<strSplitted.length; i++){

    if (Character.isLetter(strSplitted[i].charAt(0))){
        stringCharacters.add(strSplitted[i]); //If string then add to strings array
    }
    else{
        integerCharacters.add(Integer.parseInt(strSplitted[i])); //else add to integer array
    }
}
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.