Description
This regex will parse the text into a roman numeral and body. The body can then be split on the new line \n
.
^\s+\b([CDMLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)

Capture Groups
Group 0 get the entire matching section
- gets the roman numeral
- gets the body of the section, not including the roman numeral
Javascript Code Example:
Sample text pulled from your link
VII
Lo! in the orient when the gracious light
Lifts up his burning head, each under eye
Doth homage to his new-appearing sight,
VIII
Music to hear, why hear'st thou music sadly?
Sweets with sweets war not, joy delights in joy:
Why lov'st thou that which thou receiv'st not gladly,
Or else receiv'st with pleasure thine annoy?
IX
Is it for fear to wet a widow's eye,
That thou consum'st thy self in single life?
Ah! if thou issueless shalt hap to die,
The world will wail thee like a makeless wife;
Example code
<script type="text/javascript">
var re = /^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)/;
var sourcestring = "source string to match with pattern";
var results = [];
var i = 0;
for (var matches = re.exec(sourcestring); matches != null; matches = re.exec(sourcestring)) {
results[i] = matches;
for (var j=0; j<matches.length; j++) {
alert("results["+i+"]["+j+"] = " + results[i][j]);
}
i++;
}
</script>
Sample output
$matches Array:
(
[0] => Array
(
[0] => VII
Lo! in the orient when the gracious light
Lifts up his burning head, each under eye
Doth homage to his new-appearing sight,
[1] =>
VIII
Music to hear, why hear'st thou music sadly?
Sweets with sweets war not, joy delights in joy:
Why lov'st thou that which thou receiv'st not gladly,
Or else receiv'st with pleasure thine annoy?
[2] =>
IX
Is it for fear to wet a widow's eye,
That thou consum'st thy self in single life?
Ah! if thou issueless shalt hap to die,
The world will wail thee like a makeless wife;
)
[1] => Array
(
[0] => VII
[1] => VIII
[2] => IX
)
[2] => Array
(
[0] =>
Lo! in the orient when the gracious light
Lifts up his burning head, each under eye
Doth homage to his new-appearing sight,
[1] =>
Music to hear, why hear'st thou music sadly?
Sweets with sweets war not, joy delights in joy:
Why lov'st thou that which thou receiv'st not gladly,
Or else receiv'st with pleasure thine annoy?
[2] =>
Is it for fear to wet a widow's eye,
That thou consum'st thy self in single life?
Ah! if thou issueless shalt hap to die,
The world will wail thee like a makeless wife;
)
[3] => Array
(
[0] => VIII
[1] => IX
[2] =>
)
)
Roman numeral validation
The above expression only tests the roman numeral string is composed of roman numeral characters, it doesn't actually validate the number is valid. If you need to validate the roman numeral is correctly formatted too, then you could use this expression.
^\s+\b(M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)
