Those comments are very difficult to read.
j = list(set(j)) #someone warned that the problematic test case gives the same edge many times, so this is to remove duplicates
Format this properly:
# The problematic test case gives the same edge many times,
# so this is to remove duplicates
j = list(set(j))
Then remove fluff:
# Remove duplicates. Helps some test cases.
j = list(set(j))
In this case just remove the first comment, because list(set(...))
is already known to mean deduplicate. And remove the second since it's implied by the fact you've written the code.
j = list(set(j))
Another pair is
a = [] # for input purposes
b = [] # for output purposes
Don't do this. Just call them
descriptive_name_1 = []
descriptive_name_2 = []
so you don't need the comment in the first place.
Then there's q = ...
. So I wrack my brain thinking about what word contains q
. But it turns out you just meant to write num_test_cases
. What does that have to do with the letter q
?
Worse than that is when you outright lie. What would you expect the variable node
to contain? A node, right? Nope - it contains the total number of nodes.
And you write
node = 0
so we expect the total number of nodes to be 0
at that point, right? Nope, you immediately change your mind and write
node, edge = list(map(int, input().strip().split()))
But OK, fine. At least we know what nodes
contains, then: multiple node
s, so a list (or other collection) of lengths of nodes.
Nope, it actually means exactly the same thing as node
.
A similar comment goes for edge
, which should be num_edges
.
Then there's adj
, which you label "adjacency matrix". Why not just call it adjacency_matrix
then? But it's not even a dense matrix like that would imply, it's actually a list of per-node adjacencies, so call it adjacencies
.
Then there's starting
and start
, which are the same thing named differently. Call them start_node
or at least be consistent.
Going back to b
, you have
b = []
but again this is a lie, because actually
b = shortest(start_node, num_nodes, adjacencies)
So just write that.
distances = shortest(start_node, num_nodes, adjacencies)
For a
, initialize it as close to usage as possible and call it edges
.
If you write
for j in range(num_edges):
you're misleading me into thinking you're going to use the index. When you're not going to, write
for _ in range(num_edges):
And what's up with that, followed by
for j in edges:
followed by
for j in adjacencies:
Is j
just your general "this is a thing" variable? Most people use x
, elem
and item
for that. Instead, write
for _ in range(num_edges):
for edge in edges:
for adjacent in adjacencies:
The middle can be
for start, end in edges:
The comment
#edges are 6
Isn't coherent. Try
# Edges are length 6
Instead, though, don't put this information in shortest
- use 1
and multiply out when using the results.
Don't call variables temp
. It's a poor name.
You've double-indented the part where you update the neighbours.
Note that
num_nodes, num_edges = list(map(int, input().strip().split()))
is just
num_nodes, num_edges = map(int, input().strip().split())
Why do you do this?
for i in adjacencies[current-1]:
neigh = i
Just write
for neighbour in adjacencies[current-1]:
And the content can be
for neighbour in adjacencies[current-1]:
distance[neighbour-1] = min(distance[neighbour-1], distance[current-1] + 1)
Do you not think this whole -1
thing is getting a bit silly? When reading, just decrement the edge values and starting edge correctly:
edges = []
for _ in range(num_edges):
start, end = map(int, input().strip().split())
edges.append((start - 1, end - 1))
start_node = int(input()) - 1
while
loops and if
statements don't need their condition parenthesized, and operators should be spaced properly:
while (len(unique)>0): # before
while len(unique) > 0: # after
Then this is better as just while unique
.
Don't put spaces before the parentheses in function calls or indexing.
minDist
should be min_dist
. You can also start with it set to num_nodes
, and the same for distance
's initialization.
unique
is a horrible name, and has nothing to do with the variable's purpose. Try unvisited
instead. It can be initialized as
unvisited = [i for i, adj in enumerate(adjacencies) if adj]
distance
, unsurprisingly, does not mean distance
but instead distances
. Fix that.
This code:
min_dist = num_nodes
for i in unvisited:
if distances[i] <= min_dist:
min_dist = distances[i]
distIndex = i
current = distIndex
is just
min_dist = num_nodes
for i in unvisited:
if distances[i] <= min_dist:
min_dist = distances[i]
current = i
and current
should be called closest
. It is simplifiable to
closest = min(unvisited, key=lambda x: distances[x])
This is faster if you write
closest = min(unvisited, key=distances.__getitem__)
but the difference isn't vital.
One last thing with shortest
- rename it to shortest_distances
.
Put the rest of the code in a main
function.
If you write
def read_pair(decrement):
x, _, y = input().partition(" ")
return int(x) - decrement, int(y) - decrement
then you can initialize num_nodes
, num_edges
and edges
with
num_nodes, num_edges = read_pair(0)
edges = [read_pair(1) for i in range(num_edges)]
Note my use of partition
instead of split
as IMO it's a better description of the operation here.
Then
for adjacent in adjacencies:
adjacent = list(set(adjacent))
doesn't actually work! adjacent =
only affects the local scope! Instead, you want
adjacencies = [list(set(adj)) for adj in adjacencies]
which is even better as
adjacencies = [set() for _ in range(num_nodes)]
for start, end in edges:
adjacencies[start].add(end)
adjacencies[end].add(start)
as you never have the large intermediates then, and there's no real point converting back from a set.
This stuff:
distances.remove(0)
for i in range(len(distances)):
if distances[i] == num_nodes:
distances[i] = -1
else:
distances[i] *= 6
print(" ".join(str(e) for e in distances))
is just
print(*(-1 if i == num_nodes else i * 6 for i in distances if i))
Nice, that seems to have made it fast enough, but more important the code is more readable.
We can improve speed further by avoiding input
, using sys.stdin
directly to get buffering.
def main(lines):
def read_pair(decrement):
x, _, y = next(lines).partition(" ")
return int(x) - decrement, int(y) - decrement
...
main(sys.stdin)
Note that this change is sufficient on its own to beat the time limit: you can apply it to your original code to pass the test. Note also that you shouldn't mix input
with next
; at least on Python 2 this throws an error saying
ValueError: Mixing iteration and read methods would lose data
Though this error is gone in Python 3, that doesn't leave me feeling good about it.
I'd finish off with
from __future__ import print_function
to make it Python 2 compatible.
from __future__ import print_function
import sys
def shortest_distances(start_node, num_nodes, adjacencies):
distances = [num_nodes] * num_nodes
unvisited = [i for i, adj in enumerate(adjacencies) if adj]
distances[start_node] = 0
while unvisited:
closest = min(unvisited, key=distances.__getitem__)
unvisited.remove(closest)
# Update the distances of each neighbour
for neighbour in adjacencies[closest]:
distances[neighbour] = min(distances[neighbour], distances[closest] + 1)
return distances
def main(lines):
def read_pair(decrement):
x, _, y = next(lines).partition(" ")
return int(x) - decrement, int(y) - decrement
num_test_cases = int(next(lines))
for i in range(num_test_cases):
num_nodes, num_edges = read_pair(0)
edges = [read_pair(1) for i in range(num_edges)]
start_node = int(next(lines)) - 1
adjacencies = [set() for _ in range(num_nodes)]
for start, end in edges:
adjacencies[start].add(end)
adjacencies[end].add(start)
distances = shortest_distances(start_node, num_nodes, adjacencies)
print(*(-1 if i == num_nodes else i * 6 for i in distances if i))
main(sys.stdin)
NameError: name 'node' is not defined
. – Veedrac Aug 17 at 21:10node, edge
line is not valid syntax elsewhere. When I change it to read to a list first and then assign node and edge, the next error is for mixing spaces and tabs (it didn't matter in the HackerRank terminal again, sorry). I have to go right now, but I will try to clean it as soon as I get back. 2. The first link to HackerRank has a Sample Input, Sample Output, and Explanation on the page. The second link is just the input of the problematic test case. – Yifan Aug 17 at 21:31node
andedge
first (not what I originally thought), and I used a text editor to change the indentation to be consistent with spaces. Please try again. – Yifan Aug 17 at 21:40adj
is length 0 on the first iteration. – Veedrac Aug 17 at 22:04adj =
and thenode, edge =
lines. – Veedrac Aug 17 at 22:06