Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I need help filtering a big .CSV file for which a certain row must only contain strings from a predetermined set of strings held in a array returned by another Powershell function.

For example, suppose I have the following to filter:

datastore3
datastore1 vl_datastore2 datastore3
datastore1 vl_datastore2 datastore3
datastore1 datastore3

with the following array of strings through which I must discard any bad row:

datastore1 datastore3 (datastore1 in index 0, datastore3 in index 1)

In other words, my function should automatically get rid of any row that has the "vl_datastore2" substring in it, therefore only the first and last row would remain.

How can I go about this? For now I am able to split the rows to filter into an array of strings ("datastore1 vl_datastore2 datastore3" would thus be an array with 3 strings in it),but I'm having trouble finding the correct way to use any Powershell operator to filter my list correctly.

Thanks in advance!

share|improve this question

2 Answers 2

Don't know if this helps or not, but:

$TestArray = @(
'datastore3'
'datastore1 vl_datastore2 datastore3',
'datastore1 vl_datastore2 datastore3',
'datastore1 datastore3'
)

$Filters = @(
'datastore1',
'datastore3'
)

[regex]$regex = ‘(?i)(‘ + (($Filters |foreach {[regex]::escape($_)}) –join “|”) + ‘)’

$TestArray | Where {-not ($_.split() -notmatch $regex)}

datastore3
datastore1 datastore3

That builds an alternating regex from the strings in the $Filter array so that you can essentially match multiple lines to multiple strings in one operation.

The bit that's building the regex is explained here: http://blogs.technet.com/b/heyscriptingguy/archive/2011/02/18/speed-up-array-comparisons-in-powershell-with-a-runtime-regex.aspx

share|improve this answer

I think I'd go another route and use a flag variable and -notcontains. Run the array to test line by line, split each line, check each piece of the split to see if it's contained in the list of approved terms, and if not set a flag so that the line will not be passed down the pipe.

$TestArray = @("datastore3",
"datastore1 vl_datastore2 datastore3",
"datastore1 vl_datastore2 datastore3",
"datastore1 datastore3")

$Filter = @("datastore1","datastore3")

$TestArray|%{
    $SetValid = $True
    $_ -split " "|?{$Filter -notcontains $_}|%{$SetValid=$false}
    if($SetValid){$_}
}

When run that results in:

datastore3
datastore1 datastore3
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.