Take the 2-minute tour ×
Geographic Information Systems Stack Exchange is a question and answer site for cartographers, geographers and GIS professionals. It's 100% free, no registration required.

I am looking to check how up-to-date a feature class is by using our latest parcel update. I am wanting to check "owner" field with "Owner_1"...I am using field calculator to grab the last name in "owner" and find the same name in "Owner_1" regardless of capilization issues then to return 0 if the same and 1 if different in field "NAME_CHK".... I have attached a pic of what I am dealing with...the fields are not well kept so many issues!

enter image description here

share|improve this question
1  
0 would be a match 1 would show a non-match –  Masonerman9 2 days ago
    
The question is a bit under-specified. For example, which of those do you think actually match, and why? If you've got truth data, why not just use that? Also, it might help to show what you've already done, rather than expecting someone to do it all from scratch. –  BradHards 2 days ago

2 Answers 2

up vote 2 down vote accepted

Using a Python parser, calculation would look like:

enter image description here

This will return a 0 for your first two records, and a 1 for the last three.

Edit: for @Masonerman9's comment about good enough matches.

Could try difflib. I haven't used it much, but it does built in some tolerances for string comparisons. The below will give you a 1 only for the fourth record (Randy Floyd).

import difflib

def name_check(val1, val2):
    surname = val1.split(' ')[-1].lower()
    if difflib.get_close_matches(surname, val2.lower().split(' ')):
        return 0
    else:
        return 1
share|improve this answer
    
Any chance there is a way to deal with the Dixson, Dixon issue? Obviously they are the same, just a little misinterpretation on the data entry.... –  Masonerman9 2 days ago
    
@Masonerman9. Sort of. You could use something like difflib to get 'good enough' matches. I'll add some code to the answer above. –  DWynne 2 days ago
    
Ok I am thinking like when you geocode an address and it will consider it a match at like 75% automatically and you can dumb it down to what ever you want... –  Masonerman9 2 days ago
    
@Masonerman9 Ok, additional sample added. –  DWynne 2 days ago
1  
+1 Nice integration of difflib.get_close_matches –  Aaron 2 days ago

Using an Update Cursor in a Python script is an efficient approach. The following example takes the last word in the Owner_1 string and the first word in the owner string and compares the values.

import arcpy

fc = r'C:\path\to\your\database.gdb\feature_class'

with arcpy.da.UpdateCursor(fc, ["Owner_1", "owner", "NAME_CHK"]) as cursor:
    for row in cursor:
        if row[1].split(",")[0].lower() == row[0].rsplit(" ", 1)[-1].lower():
            row[2] = 0
        else:
            row[2] = 1
        cursor.updateRow(row)

enter image description here

share|improve this answer
    
Any chance there is a way to deal with the Dixson, Dixon issue? Obviously they are the same, just a little misinterpretation on the data entry.... –  Masonerman9 2 days ago
1  
How would you like to handle that? Currently, it sees a mismatch and marks it as such... –  Aaron 2 days ago
    
Like when you geocode an address and it will consider it a match at like 75% automatically and you can dumb it down to what ever you want... –  Masonerman9 2 days ago
    
That is a completely separate question and should be asked as a new question. –  Aaron 2 days ago
    
@Masonerman9 I noticed the same thing earlier with McKenzie and MCKINZIE. I would just keep in mind that the broader/more tolerant your matches are, the more likely you are to end up with false positives or negatives (no idea how or if the difflib suggested in the other answer allows for a tolerance setting). Since you're dealing with parcel info, I also don't believe a single field check is going to cut it - parcels can be split as well as sold and recombined. The method may work for what you're doing with whatever features you're checking for 'currentness', but I see a lot of risk here. –  Chris W 2 days ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.