(Originally posted on Stack Overflow)
I need to run through a number of bespoke files (I'm unsure if they are unicode / ascii or binary so have attached a snapshot through notepad++). I'd like to count the number of instances of a simple string.
I've managed to get this working using the CreateObject("ADODB.Stream")
method but I'd like to speed this up.
Below is the working code in excel, setup as a function.
Function CountPRODSinNielsenDBs(StrFilename As String, RngThisCell)
'StrFilename = "C:\Nielsen Databases\UBINCCCS.hed"
'used to disable the function whilst editing other parts of the sheet
'prefer this approach over setting calculation to manual
If Range("Active") = 0 Then
CountPRODSinNielsenDBs = RngThisCell.Value
Exit Function
End If
Const adTypeText = 2
Set ADStream = CreateObject("ADODB.Stream")
With ADStream
.Type = adTypeText
.Open
.LoadFromFile StrFilename
.Charset = "us-ascii"
strText = .ReadText()
End With
' Now do our regex search...
Set rgxp = CreateObject("VBScript.RegExp")
With rgxp
.Pattern = "\x00P[0-9]{39}"
.Global = True
.IgnoreCase = True
Set matches = .Execute(strText)
End With
CountPRODSinNielsenDBs = matches.Count
'Debug.Print matches.Count
End Function
This function works ok as is and so I just left the default setting for .charset to ascii and adTypeText to 2 as it seemed to work with the regex matching process.
However, it's a little slow and I believe that the OpenAsTextStream
method should be quicker:
Const ForReading = 1
Const TristateFalse = 0
Dim strSearchThis
Dim objFS
Dim objFile
Dim objTS
Dim strText
Set objFS = CreateObject("Scripting.FileSystemObject")
Set objFile = objFS.GetFile(StrFilename)
strText = objFile.OpenAsTextStream(ForReading, TristateFalse).readall
However the regex returns zero matches so I began wondering if in fact this approach can't handle the data as it's not actually ascii or some other format.
Please can someone take a look at this pic to help me understand what type of datatype this might be and whether I can speedup the searching and counting of the instances?
There are many files and each one runs into the MB's you see.