Can anyone help me with Regex ? I got an Java Program, that reads in .csv files to load it into a Database.
Currently ist uses Pattern csvPattern = Pattern.compile("\\s*(\"[^\"]*\"|[^|]*)\\s*,?");
but wenn I do matcher = csvPattern.matcher(line);
to read the files line by line I only get null-values.
The files have the following format (many morre lines, some with comma in it, '|' as a seperator and at the end of each line):
abstact of the first file:
0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
1|ARGENTINA|1|al foxes promise slyly according to the regular accounts. bold requests alon|
2|BRAZIL|1|y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special |
second:
|Customer#000000001|IVhzIApeRb ot,c,E|15|25-989-741-2988|711.56|BUILDING|to the even, regular platelets. regular, ironic epitaphs nag e|
2|Customer#000000002|XSTf4,NCwDVaWNe6tEgvwfmRchLXak|13|23-768-687-3665|121.65|AUTOMOBILE|l accounts. blithely ironic theodolites integrate boldly: caref|
3|Customer#000000003|MG9kdTD2WBHm|1|11-719-748-3364|7498.12|AUTOMOBILE| deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov|
4|Customer#000000004|XxVSJsLAGtn|4|14-128-190-5944|2866.83|MACHINERY| requests. final, regular ideas sleep final accou|
third:
5|Supplier#000000005|Gcdm2rJRzl5qlTVzc|11|21-151-690-3663|-283.84|. slyly regular pinto bea|
6|Supplier#000000006|tQxuVm7s7CnK|14|24-696-997-4969|1365.79|final accounts. regular dolphins use against the furiously ironic decoys. |
7|Supplier#000000007|s,4TicNGB4uO6PaSqNBUq|23|33-990-965-2201|6820.35|s unwind silently furiously regular courts. final requests are deposits. requests wake quietly blit|
8|Supplier#000000008|9Sq4bBH2FQEmaFOocY45sRTxo6yuoG|17|27-498-742-3860|7627.85|al pinto beans. asymptotes haggl|
9|Supplier#000000009|1KhUgZegwM3ua7dsYmekYBsK|10|20-403-398-8662|5302.37|s. unusual, even requests along the furiously regular pac|
fourth:
1|2|3325|771.64|, even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful|
1|2502|8076|993.49|ven ideas. quickly even packages print. pending multipliers must have to are fluff|
1|5002|3956|337.09|after the fluffily ironic deposits? blithely special dependencies integrate furiously even excuses. blithely silent theodolites could have to haggle pending, express requests; fu|
1|7502|4069|357.84|al, regular dependencies serve carefully after the quickly final pinto beans. furiously even deposits sleep quickly final, silent pinto beans. fluffily reg|
fifth:
1|155190|7706|1|17|21168.23|0.04|0.02|N|O|1996-03-13|1996-02-12|1996-03-22|DELIVER IN PERSON|TRUCK|egular courts above the|
1|67310|7311|2|36|45983.16|0.09|0.06|N|O|1996-04-12|1996-02-28|1996-04-20|TAKE BACK RETURN|MAIL|ly final dependencies: slyly bold |
sixth:
134823|saddle midnight thistle honeydew lime|Manufacturer#4|Brand#43|STANDARD BURNISHED BRASS|44|WRAP CAN|1857.82|ges. furiously ir|
134824|coral red indian thistle sandy|Manufacturer#5|Brand#55|PROMO BURNISHED COPPER|29|LG JAR|1858.82|final p|
134825|saddle purple orchid cornsilk medium|Manufacturer#4|Brand#44|PROMO POLISHED NICKEL|21|LG CASE|1859.82|nal accounts us|
134826|turquoise sky lime cornsilk peach|Manufacturer#1|Brand#11|SMALL BURNISHED TIN|25|SM CAN|1860.82| haggle|
seventh:
0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
1|AMERICA|hs use ironic, even requests. s|
eighth:
4|136777|O|32151.78|1995-10-11|5-LOW|Clerk#000000124|0|sits. slyly regular warthogs cajole. regular, regular theodolites acro|
5|44485|F|144659.20|1994-07-30|5-LOW|Clerk#000000925|0|quickly. bold deposits sleep slyly. packages use slyly|
(the csv were created using the DBGen-tool from TPC fpr tpc-h, in case you wonder)
I hope you understand what I need and can help me out on this. Thank you very much!
EDIT: Using String.split("|");' sure seems obvious, but the thing is, that the programm I'm working with is quite complex and relies on the regex.pattern and regex.matcher at various parts. So since I'm not very familiar with the program and java itself, the only solution for me is to use the given code and just replace the regular expression by one that works for me.
EDIT2: the thing is I'm trying to use this TPC-H implementation from OLTP-Bench: https://github.com/ben-reilly/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/TPCHLoader.java#L347
where the problematic line is 347. It's a full implemetation of the TPC-H database benchmark, but without a data generator. So I use the dbgen tool provided by the TPC to generate the csv files. I can't get in contact with the developer sadly.
String.split()
? – Mike Elofson Jun 24 at 13:57