5
\$\begingroup\$

(Originally posted on Stack Overflow)

I need to run through a number of bespoke files (I'm unsure if they are unicode / ascii or binary so have attached a snapshot through notepad++). I'd like to count the number of instances of a simple string.

I've managed to get this working using the CreateObject("ADODB.Stream") method but I'd like to speed this up.

Below is the working code in excel, setup as a function.

Function CountPRODSinNielsenDBs(StrFilename As String, RngThisCell)

'StrFilename = "C:\Nielsen Databases\UBINCCCS.hed"

'used to disable the function whilst editing other parts of the sheet
'prefer this approach over setting calculation to manual
If Range("Active") = 0 Then
CountPRODSinNielsenDBs = RngThisCell.Value
Exit Function
End If


Const adTypeText = 2
Set ADStream = CreateObject("ADODB.Stream")
With ADStream
    .Type = adTypeText
    .Open
    .LoadFromFile StrFilename
    .Charset = "us-ascii"
    strText = .ReadText()
End With

' Now do our regex search...
Set rgxp = CreateObject("VBScript.RegExp")
With rgxp
    .Pattern = "\x00P[0-9]{39}"
    .Global = True
    .IgnoreCase = True
    Set matches = .Execute(strText)
End With

CountPRODSinNielsenDBs = matches.Count
'Debug.Print matches.Count

End Function

This function works ok as is and so I just left the default setting for .charset to ascii and adTypeText to 2 as it seemed to work with the regex matching process.

However, it's a little slow and I believe that the OpenAsTextStream method should be quicker:

Const ForReading = 1
Const TristateFalse = 0
Dim strSearchThis
Dim objFS
Dim objFile
Dim objTS
Dim strText
Set objFS = CreateObject("Scripting.FileSystemObject")
Set objFile = objFS.GetFile(StrFilename)
strText = objFile.OpenAsTextStream(ForReading, TristateFalse).readall

However the regex returns zero matches so I began wondering if in fact this approach can't handle the data as it's not actually ascii or some other format.

Please can someone take a look at this pic to help me understand what type of datatype this might be and whether I can speedup the searching and counting of the instances?

There are many files and each one runs into the MB's you see.

Notepad++ snapshot of the .hed file I'm trying to count the instances of the strings marked in red using a regex pattern of \0P[0-9]{39}

Alternate link to the Notepad++ screenshot

\$\endgroup\$
3
  • \$\begingroup\$ HI All, please can someone help on this. I'm still struggling on trying to get the OpenAsTextStream method to work. Thanks. \$\endgroup\$ Commented Oct 21, 2016 at 7:34
  • \$\begingroup\$ Hi All, anyone, anyone? I've been reading into this more but still not able to understand why it's not working. Hope someone can help soon. Thanks. \$\endgroup\$ Commented Oct 24, 2016 at 21:55
  • \$\begingroup\$ just wondering if anyone is able to help me on this - still struggling - please help!! \$\endgroup\$ Commented Nov 24, 2016 at 10:50

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.