Use more meaningful variable names, for example:
lines
instead of t
length
instead of l
- Since
y
is a character position, I would rename to pos
let
to letter
, because it's not much longer and a bit more readable (since let
is keyword in some other programming languages, I find it confusing to read)
Since lines[0]
is kind of special, and you make many references to it, it would be good to give it a name:
first = lines[0]
Your input lines end with a trailing newline character, but in the main logic of pattern matching this is just noise, not an important detail. I recommend this:
first = lines[0].strip()
This will get rid of the trailing newline character, and in the main logic you don't have to worry about checking letters until first[-1]
, you can consider the letters in first
, which is simpler.
If you make changes to the script, you have to repeat all the tests to see if it's still working. That's very troublesome. To make that easier, add some unit tests, and refactor the main
method to make it more testable, for example:
import unittest
from test.test_support import run_unittest
def main():
n = int(sys.stdin.readline())
lines = sys.stdin.readlines()[0:n]
print find_common_pattern(lines)
def find_common_pattern(lines):
# the rest of the code from the old version of `main`
class FindPatterns(unittest.TestCase):
def test_given_examples(self):
self.assertEqual('xab', find_common_pattern(['?ab\n', '??b\n']))
self.assertEqual('?', find_common_pattern(['a\n', 'b\n']))
self.assertEqual('xaxb', find_common_pattern(['?a?b\n']))
run_unittest(FindPatterns)
After this, it will be easy to follow my suggestions below and verify that the script is still working correctly, passing all test cases. (Feel free to add more!)
Instead of range(0, length-1)
, you can write simply range(length-1)
.
if let == '?' and t[x][y]!= '?':
let = t[x][y]
if t[x][y] != let and t[x][y] != '?':
result += '?'
If the first if
was true, the second will not be true, and it's a waste to evaluate it again. You could make the second if
an elif
instead.
The last elif
in your for
loop is a bit dirty: it's for checking if you're on the last line. The condition will be evaluated for every line where the current position is ?
, which is a bit wasteful. It would be better to refactor that loop to perform that check only once at the end, for example:
for pos in range(length):
letter = first[pos]
newletter = letter
for x in range(1, len(lines)):
if letter == '?' and lines[x][pos] != '?':
letter = lines[x][pos]
elif lines[x][pos] != letter and lines[x][pos] != '?':
newletter = '?'
break
if letter == '?':
result += 'x'
else:
result += newletter
After this refactoring, you don't need the value of x
inside the loop anymore, so you could rewrite more intuitively as:
for line in lines[1:]:
if letter == '?' and line[pos] != '?':
letter = line[pos]
elif line[pos] != letter and line[pos] != '?':
newletter = '?'
break
Since the characters ?
and x
appear several times in the code, I would treat them as constants and name them for example:
ANYCHAR = '?'
SAMPLECHAR = 'x'
Instead of running main
, the common way is like this:
if __name__ == '__main__':
main()
This way, your script will still run as before when you do python script.py
, and at the same time it will be possible to import it from another python script as a library without automatically running it. It's a good practice to get used to, even if you don't intend to make this a library.
Putting it all together, and some other minor improvements, this would be more Pythonic, and slightly more readable and efficient:
ANYCHAR = '?'
SAMPLECHAR = 'x'
def find_common_pattern(lines):
first = lines[0].strip()
lines = lines[1:]
if not lines:
return first.replace(ANYCHAR, SAMPLECHAR)
result = ''
for pos in range(len(first)):
letter = first[pos]
newletter = letter
for line in lines:
if line[pos] != ANYCHAR:
if letter == ANYCHAR:
letter = line[pos]
elif line[pos] != letter:
newletter = ANYCHAR
break
if letter == ANYCHAR:
result += SAMPLECHAR
else:
result += newletter
return result