6
\$\begingroup\$

Here is what I came up with for a binary search of an array. It allows for searching one or many columns to match. It assumes that array being searched is sorted with the same sort priorities as the order in which the parameters are submitted.

It seems to work well but any advice on performance or structure would be appreciated.

Option Explicit
Option Compare Text

Private Function BinFind(inArray As Variant, hasHeader As Boolean, ParamArray WhatValueWhatColumn() As Variant) As Long
    '== =============================================================================================================
    '== conducts a binary tree search in a 1 or 2 dimensional array
    '==
    '== looks for matches defined in 'WhatValueWhatColumn'
    '== supports matching of multiple columns.
    '==
    '== each ParamArray pair submitted must be an array stating what is being seeked and in which column it is found
    '== i.e. BinFind(arr1, True, Array(firstName, 2), Array(lastName, 1), Array(birthdate, 6))
    '==
    '== the array being searched, 'inArray' must be sorted in ascending order
    '== where the sort priority matches the order of the 'WhatValueWhatColumn' submissions
    '==
    '== {TODO} - test that array is sorted correctly prior to execution, raise error if not.
    '== =============================================================================================================

    Dim High As Long    'highest boundary of range being searched
    Dim Low As Long     'lowest boundary of range being searched
    Dim Mid As Long     'current split point
    Dim v As Variant    'WhatValueWhatColumn variant
    Dim found As Boolean

    '== initialize lower and upper bounds
    High = UBound(inArray)
    Low = LBound(inArray) - hasHeader

    '== test to see if the primary column is outside the lower or upper bounds of the entire test array
    v = WhatValueWhatColumn(0)
    If v(0) < inArray(Low, v(1)) Or v(0) > inArray(High, v(1)) Then
        BinFind = -1
        Exit Function
    End If


    '== loop through all records, walking low and high in until the correct record is found
    '== or it is determined that the record is not in the array
    Do
        '== determine the middle record of the dataset
        Mid = (High + Low) / 2

        '== test each column by priority
        '== if it is a match check the next column in the sequence
        '== if it is not a match then move the appropriate
        '== boundary to the mid position
        '== if all columns match then the record is found!
        For Each v In WhatValueWhatColumn

            If inArray(Mid, v(1)) = v(0) Then
                found = True

            Else
                found = False

                If Low = High Then
                    Exit Do

                ElseIf inArray(Mid, v(1)) > v(0) Then
                    High = Application.Max(Mid - 1, Low)

                Else
                    Low = Application.Min(Mid + 1, High)

                End If
                Exit For

            End If

        Next v

    Loop Until found

    If found Then
        BinFind = Mid
    Else
        BinFind = -1
    End If

End Function

Here is the test I used in Excel. adjust as necessary

'#################################
'## TEST
'#################################
Sub testBinFind()
    Dim arr1 As Variant
    Dim arr2 As Variant
    Dim i As Long
    Dim index As Long

    With ThisWorkbook
        '== the array we are going to look for stuff in
        arr1 = .Sheets("Array1").UsedRange
        '== the array we are defining what we are looking for from and pasting data back to
        arr2 = .Sheets("Array2").UsedRange
    End With

    For i = LBound(arr2) + 1 To UBound(arr2) 'LBound(arr2) + 1 skips the headers

        '== Identify some columns that need to be matched
        '== makes it easier to see what's going on when submitting Params to BinFind
        Dim lastName As String:     firstName = arr2(i, 1)
        Dim firstName As String:    lastName = arr2(i, 2)
        Dim dedCd As String:        dedCd = arr2(i, 3)

        '== Here we go!
        index = BinFind(arr1, True, _
            Array(firstName, 1), _
            Array(lastName, 2), _
            Array(dedCd, 3))

        '== if a match was found, post it back to the final array
        If index > 0 Then
            arr2(i, 4) = arr1(index, 4)
        End If

    Next i

    '== post the array back to the sheet
    With ThisWorkbook.Sheets("Array2")
        .UsedRange.Value = arr2
        .ListObjects.Add xlSrcRange, .UsedRange, xlYes
    End With

End Sub
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Welcome to Code Review. Nice first question. I hope you get some good answers. \$\endgroup\$
    – pacmaninbw
    Commented Aug 27, 2016 at 16:46

2 Answers 2

4
\$\begingroup\$

Option Compare Text - I don't like it. I also don't like Option Base 1 for the same reason. It causes excel to behave in a way other than the default way, which makes maintenance more difficult. I don't really see the need for the Text when the default Binary would be fine if you pass your arguments through UCase for instance. It's much easier to follow and explicitly shows the reader that case is irrelevant.

Naming

Your naming leaves something to be desired. It's hard to follow when you could use a more descriptive name that would allow easier identification.

  • If High is "highest boundary of range being searched" why not call it highSearchBoundary or something similar? It doesn't cost anything and it removes the need for explanatory comments. Same goes for Low
  • v does nothing to describe itself. Try to avoid single letter variables other than (maybe) i for iteration. What is v - WhatValueColumnVariant? I'm not sure what that would be.
  • found is boolean, just call it isFound similar to your hasHeader boolean
  • On the same token, inArray looks like it should be a boolean, not a variant.
  • Your function can be called BinaryTreeSearch if that's what it is
  • Your inArray and hasHeader arguments are being passed ByRef when they could be passed ByVal
  • I think ParamArray is always passed ByVal - but I'm not too familiar with it

Speaking of comments

Comments - "code tell you how, comments tell you why". The code should speak for itself, if it needs a comment, it might need to be made more clear. If not, the comment should describe why you're doing something rather than how you're doing it. Here are a few reasons to avoid comments all together. You have a lot of commentary which should indicate that your code isn't as clean as it could be. I understand some UDF documentation up top, but other than that it seems superfluous.

But, with comments, I see twice that you return -1 - but I don't know why. Maybe that needs an explanation.

The comment "the array being searched, 'inArray' must be sorted in ascending order" tells me that your function will break with unexpected input. If you need it in ascending order, write a sub to do so. That may be difficult though, considering how ParamArray would define the sort order. All of that leaves a lot of room for unhandled errors, I think. That may not be the best way to put it.

You may need to handle any errors where a passed ParamArray has a higher column that the count of columns in the searchArray though.

ParamArray

I'm no expert in using this, but it seems to work pretty well in your application of it. But the naming could be improved findValueInWhichColumn because that's what it does, right? It finds each value in whichever column of the inArray is specified?

Loop

this information might not be 100% accurate because of a misinterpretation

Your loop is hard to follow.

Using a For Each on an object is less than ideal - try to use a For Next instead.

Then you take each ParamArray and compare it to the mid of the searchArray and iterate down or up. The ParamArray already specifies a column in the searchArray, so why not pull the column into an array and find it there in a refactored function? Maybe even use a Scripting.Dictionary's exists's for eliminating the loop within the refactored function?

Private Function SearchArrayColumn(ByVal searchColumn As Variant, ByVal searchValue As String) As Boolean
'Do your thing
End Function

Then for each ParamArray you can use the function, something similar to

For Each v in WhatValueWhatColumn
isfound = SearchArrayColumn(Application.WorksheetFunction.Index(searcharray, , v(0)), v(1))
if isfound then 'your result
Next

Now, maybe you could find a way to get around the sort condition by passing sorted columns or sorting the column as passed. You could keep iterating through the refactored function until not isFound and then exit your binary search as not found, otherwise go to the next ParamArray and if isFound never goes False you have your True condition.

Just spitballing here, but perhaps if you index all the rows of your array that match the first condition, you can pass entire rows to the refactored function and check all of the conditions at the same time, either passing or failing and stop when it passes or exhausts the matches. You could utilize Find and FindNext to pass each possible match.

Overall, I think this is a clever function and a great question. I'm sure you'll get some better answers than mine.

\$\endgroup\$
7
  • \$\begingroup\$ Thanks @Raystafarian. Great breakdown and detail. Good points on variable naming. I use single letters only for iteration as suggested. i for integer/long, v for variant which is required in a 'for each' loop. inArray suggesting boolean is good too. ParamArrays are indeed always byVal. link. The excessive commenting is not a standard of mine typically. I'm new to the public code review deal and meant it to be helpful. Will adjust. \$\endgroup\$ Commented Aug 28, 2016 at 6:07
  • \$\begingroup\$ Correct me if I'm wrong but I think you may have misinterpreted what this does. This function seeks a single "row" or "Record" in a 2D array where ALL of the parameters are true. i.e. if we are looking for (kevin,conner,34) and a record set contains 100 kevins, 100 conners, and 700 34s, it will only return a single index if one of the records has all (kevin,conner,34) on the same row. \$\endgroup\$ Commented Aug 28, 2016 at 6:17
  • \$\begingroup\$ I'm on board with your Option Compare Text belief! As written it takes about 5 seconds to compare 2 arrays of 10,000 records each. I tried refactoring it to use uCase() and it took 5 minutes to run and locked up Excel in the meantime. I think I'll have to keep it as is unfortunately.. \$\endgroup\$ Commented Aug 28, 2016 at 6:25
  • \$\begingroup\$ Yes to your second comment, the way I described it won't find a single record just using the boolean, you'd have to pass the ParamArrays to the refactored function and return the record # and compare those to each other. I still think refactoring it and using a separate function would work better than the loop here. I'll have to think about it again and maybe edit my answer. \$\endgroup\$ Commented Aug 28, 2016 at 11:01
  • \$\begingroup\$ I'd be curious to see how an explicit StrComp(string1,string2,vbTextCompare) compares to other string comparison strategies. Surely it's less expensive than the variant UCase, or even the stringly-typed UCase$ comparisons. \$\endgroup\$ Commented Aug 28, 2016 at 15:14
2
\$\begingroup\$

This is problematic:

Dim Mid As Long

You're shadowing the built-in Mid function, which is useful for extracting a substring from a string value. Regardless of whether or not the procedure uses it, hiding it with a local variable is a bad idea, be it only for the sake of the poor maintainer glancing at the code for the first time and seeing this:

If found Then
    BinFind = Mid

But anyway what I want to focus on in this review, is the test procedure.


It's not clear what the test is testing, what the inputs are, and what conditions make it pass or fail - a good test has all inputs and dependencies under control, doesn't have side-effects on global state and produces the same results everytime it runs; a good test tests one thing - so I would have expected to see more than just one of them, each with various edge-case inputs.

In other words, testing a piece of code involves much more than just running it and seeing it work: it's actually more about describing what the code is supposed to be doing, such that if/when the code needs to change, you can run the tests and make sure it still works as it should - without having to analyze a spreadsheet to confirm it does.

The procedure under test works off an array; why burden ourselves with a worksheet? We're passing True into the hasHeader parameter; if the test produces the expected results, are we sure it handles a False just as well? What if inArray is 0-based?

Low = LBound(inArray) - hasHeader

What happens if the ParamArray values are 1-based arrays? What happens if they're not arrays?

If inArray(Mid, v(1)) = v(0) Then

If given any non-array value in the ParamArray values, the procedure should raise an error, then there should be a test that documents it. If given a one-based array it should work fine, then a test should document it - and another with a zero-based array, and another with an array that has not two but three values; there should be a test for each input and edge case, documenting how the code is specified to behave.

Each test would arrange the inputs, act with them, and then assert a result that determines whether the test passed or failed; the tests wouldn't share any resources or global state (e.g. worksheets), could be executed in any order, independently or each other, and consistently produce the same results.

The Arrange part of an Arrange-Act-Assert test could look like this:

Dim hasHeader As Boolean
hasHeader = True
Dim inArray As Variant
inArray = GetOneBasedTestArray2D(hasHeader)
Dim criteria = Array("test", 1)
Const expected = 1

Where GetOneBasedTestArray2D could be a Private Function in the test module, a helper that other tests could reuse.

The Act part is simple:

Dim actual As Long
actual = BinFind(inArray, hasHeader, criteria)

The Assert part is where all the fun really is - in a Rubberduck unit test it could look like this:

Assert.AreEqual expected, actual

Then there would be another test that ensures the function returns -1 when a value isn't found; all in all there could very well be a dozen and a half tests that cover the code inside out and document how the code handles its inputs.

With well-named tests (e.g.GivenNonArrayInput_RaisesError), you could read the names of the tests and immediately know what to expect of the function in all corner cases, not just the obviously correct ones - in other words, with proper tooling (full disclosure: I'm heavily involved with Rubberduck), VBA code can be just as professionally written as code in any other language.

Unit testing is so much more valuable than just running the code and saying "yep, it works": with a proper test suite, you can safely start refactoring your code without fear of breaking functionality - if a change breaks something, a test should tell you.

\$\endgroup\$
1
  • \$\begingroup\$ 1, I can't believe I shadowed Mid() and completely overlooked it until you mentioned it! 2, Thank you for instruction on the test. What I have is indeed just a method that sees it in action. I noticed myself that it was not in fact a test after submitting it. \$\endgroup\$ Commented Aug 29, 2016 at 5:44

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.