i tryed to use multilist to hold scraped data from html
but after 50.000 list append i got memory error
So i decided to change lists to numpy array
SapList= []
ListAll = np.array([])
def eachshop(): #filling each list for each shop data
global ListAll
SapList.append(RowNum)
SapList.extend([sap]) # here can be from one to 10 values in one list["sap1","sap2","sap3",...,"sap10"]
SapList.extend([[strLink,ProdName],ProdCode,ProdH,NewPrice, OldPrice,[FileName+'#Komp!A1',KompPrice],[FileName+'#Sav!A1','Sav']])
SapList.extend([ss]) # here can be from null to 80 sublist with 3 values [["id1", "link", "address"],["id80", "link", "address"]]
ListAll = np.append(np.array(SapList))
So then i do print(ListAll)
i got exception C:\Python36\scrap.py, LINE 307 "ListAll = np.append(np.array(SapList))"): setting an array element with a sequence
now for speed up i using pool.map
def makePool(cP, func, iters):
try:
pool = ThreadPool(cP)
#perebiraem Url
pool.map_async(func,enumerate(iters, start=2)).get(99999)
pool.close()
pool.join()
except:
print('Pool Error')
raise
finally:
pool.terminate()
So how to use numpy array in my example and reduce memory usage and speedup I\O operation using Numpy?
ListAll = np.append(np.array(SapList))
supposed to be doing? It’s obviously not going to append anything toListAll
, it’s going to callappend
on nothing but the temp array created fromSapList
, then store the result inListAll
, replacing whatever used to be there. I’m pretty sure that’s not what you want, but I’m not sure what you do want, so I can’t tell you how to fix it.ListAll = np.append(np.array(SapList))
is same asListAll.append([SapList])
append
method onListAll
. The former calls anappend
function on thenp
module, doesn't even passListAll
to it, and then just assigns the result toListAll
.