Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I want to apply three levels of filtering on record.

  1. rec in cname. rec is string of >2 words, so I want to consider all ngram of rec to be checked in record

    i. if gram matches into record increment filter_company_level and write it to file

  2. 2nd Filter on record is for each value value in self.keyword_material_info list.

    i. if value matches into record increment filter_with_material_info and write it to file

  3. 3rd filter is for item in self.keyword_bse_list .

    i. if item matches into record increment filter_with_keyword_info and write it to file

    ii. Now move to next record if inner most filter is present.

I have written these code, does it satisfy above conditions, or is there any bug? None of them give error but want to make sure that logic is correct.

        for record in fetch_record:
            total += 1
            for rec in cname:
                try:
                    c_ngram = self.get_ngrams(rec['company_name'])
                    for gram in c_ngram:
                        if gram.lower()+' ' in u'{} {}'.format(record['title'], record['description']).lower():
                            filter_company_level += 1
                            # print "Matched based on company name : ", record['article_link']
                            company_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+'\n')
                            for value in self.keyword_material_info:
                                if value.lower()+' ' in u'{} {}'.format(record['title'], record['description']).lower():
                                    filter_with_material_info += 1
                                    materialinfo_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+' - '+value+'\n')
                                    for item in self.keyword_bse_list:
                                        if item.lower()+' ' in u'{} {}'.format(record['title'], record['description']).lower():
                                            filter_with_keyword_info += 1
                                            keyword_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+" - "+value+' - '+item+'\n')
                                            print record['article_link']
                                            print value
                                            print item
                                            break
                                    break
                            # break
                            raise GetOutOfLoop
                except GetOutOfLoop:
                    break

Or this one is correct?

        for record in fetch_record:
            total += 1
            for rec in cname:
                try:
                    c_ngram = self.get_ngrams(rec['company_name'])
                    for gram in c_ngram:
                        if gram.lower()+' ' in u'{} {}'.format(record['title'], record['description']).lower():
                            filter_company_level += 1
                            # print "Matched based on company name : ", record['article_link']
                            company_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+'\n')
                            for value in self.keyword_material_info:
                                if value.lower()+' ' in u'{} {}'.format(record['title'], record['description']).lower():
                                    filter_with_material_info += 1
                                    materialinfo_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+' - '+value+'\n')
                                    flag_keyword = 0
                                    for item in self.keyword_bse_list:
                                        if item.lower()+' ' in u'{} {}'.format(record['title'], record['description']).lower():
                                            filter_with_keyword_info += 1
                                            keyword_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+" - "+value+' - '+item+'\n')
                                            print record['article_link']
                                            print value
                                            print item
                                            flag_keyword = 1
                                            break
                                    if flag_keyword == 1:
                                        break
                            # break
                            if flag_keyword == 1:
                                raise GetOutOfLoop
                except GetOutOfLoop:
                    break
share|improve this question
up vote 3 down vote accepted

I’d go with the review of the second version of the code since it’s the same as the first one with the addition of the flag_keyword logic. If you want guidance on how to produce code matching requirements or how to debug your code, you’d be better of asking on Stack Overflow.

A little bit on logic, though: since you want to move to the next record when the innermost filter validates your flag_keyword value should be reset at each iteration of the outermost loop. Preferably as the first condition after for record in fetch_record. If your dataset makes it through the whole for loop without entering the last if, then your

if flag_keyword == 1:
    raise GetOutOfLoop

will raise NameError: name 'flag_keyword' is not defined.

You are using the flag to run some code only if you broke out of a loop. Note, however, that if you change the logic a bit and want to run some code only if you didn't broke the loop you can use a for ... else construct.

Now let’s go to the code:

  • Your GetOutOfLoop exception's only purpose is to let you break out of the outermost loop only if you broke from the innermost one (i.e. your innermost filter validates). The same thing you are doing with your flag_keyword at a deeper nesting level. Get consistent: don't mix implementations for similar tasks. There is an other approach if you read on but I’d raise the exception in replacement of the innermost break to avoid all your 'break if I broke' logic because:
  • You have too much nesting levels, it makes code harder to read and follow. Remember that:

    for value in data:
        if is_valid(value):
            do_stuff()
    

    can also be written:

    for value in data:
        if not is_valid(value):
            continue
        do_stuff()
    

    when do_stuff is composed of nested loops, it can save on indentation levels. However, you can get rid of these test by using generators:

    for value in (v for v in data if is_valid(v)):
        do_stuff()
    
  • Even if the purpose of the for ... else loop is to run some code only if you didn't break out of the loop, you can use it to run some code only if you did break. The general approach is:

    for value in data:
        # some code
    else:
        do_stuff_if_no_break()
    

    But for your use case, since you are within loops each time, you can change it to:

    for value in data:
        # some code
    else:
        continue
    do_stuff_if_break()
    
  • You check for various things in u'{} {}'.format(record['title'], record['description']).lower() but you are building this string a large number of times. It is best, performance-wise, to build it once as soon as you get the record dictionary. Speaking of dictionary, you can improve the syntax a bit:

    u'{title} {description}'.format(**record).lower()
    
  • You use lower on each items of your intermediate iterables (self.keyword_material_info, self.keyword_bse_list and the one returned by self.get_ngrams()). Can't you build those iterables with already formatted strings?

The code would become:

for record in fetch_record:
    search = u'{title} {description}'.format(**record).lower()
    total += 1
    for rec in cname:
        c_ngram = self.get_ngrams(rec['company_name'])
        # assuming c_ngrams contains items already formatted with .lower()+' '
        for gram in (g for g in c_ngram if g in search):
            filter_company_level += 1
            # print "Matched based on company name : ", record['article_link']
            company_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+'\n')
            # assuming keyword_material_info contains items already formatted with .lower()+' '
            for value in (v for v in self.keyword_material_info if v in search):
                filter_with_material_info += 1
                materialinfo_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+' - '+value+'\n')
                # assuming keyword_bse_list contains items already formatted with .lower()+' '
                for item in (i for i in self.keyword_bse_list if i in search):
                    filter_with_keyword_info += 1
                    keyword_write.write(record['article_link']+' - '+rec['company_name']+' - '+rec['company_code']+" - "+value+' - '+item+'\n')
                    print record['article_link']
                    print value
                    print item
                    break
                else:
                    # skip the next break if we didn't found an <item> matching <search>
                    continue
                break
            else:
                continue
            break
        else:
            continue
        break

However I’m a little bit worried on the nesting of your loops. Since neither self.keyword_material_info nor self.keyword_bse_list seems to depend on gram... They might be modified by the call of self.get_ngrams() but I can't verify.

One last note, that I didn’t try to enforce while re-writting your code: you may want to improve your variable naming. item or value does not carry much information, for instance.

share|improve this answer

Large try blocks are bad, avoid them at all costs. If you want a way to immediately break out of nested for loops, wrap the outermost loop in a function and then use return.

def my_nested_loops():
    for i in range(10):
        for j in range(10):
            for k in range(10):
                for l in range(10):
                    for m in range(10):
                        if done:
                            return val
                else:
                    if sorta_done:
                        return val
    # No value found
    return False or None or raise SomeError

As you can see, you could return from any level, at any point. If you come across errors or determine that there's no valid result, then you can return False or None or raise SomeError if you need to. Of course this is in addition to Mathias's suggestions about using continue. As much as possible, try avoiding nesting to begin with but there are better ways to handle it if you have no choice.

Don't use flag_keyword = 1 as a flag. Use booleans for flags, you can set them just as easily, and use if flag_keyword as your test. Though as far as I can see, you could instead of doing any of that have a return when you want to set your flag.

share|improve this answer
    
Damn, I knew there should be something way cleaner than my else: continue but couldn't think of it out of OP's code… – Mathias Ettinger Sep 24 '15 at 20:47

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.