Usage of scrapy.utils.response.response_httpreprinsideDownloaderStats middleware causing application to make unnecessary memory allocation. response_httprepr used only one time - to calculate response sizes for downloader/response_bytes stats in that middleware.
In current implementation response_httprepr return bytes (immutable type) - in order to calculate downloader/response_bytes application will additionally allocate nearly the same memory amount as for original response (only for calculating len inside middleware).
In order to demonstrate influence of this i made this spider:
spider code
importsysfromimportlibimportimport_moduleimportscrapyclassMemoryHttpreprSpider(scrapy.Spider):
name='memory_httprepr'custom_settings= {
'DOWNLOADER_MIDDLEWARES':{
'scrapy.downloadermiddlewares.stats.DownloaderStats': None
}
}
# the same as in MemoryUsage extension:defget_virtual_size(self):
size=self.resource.getrusage(self.resource.RUSAGE_SELF).ru_maxrssifsys.platform!='darwin':
# on macOS ru_maxrss is in bytes, on Linux it is in KBsize*=1024returnsizedefstart_requests(self):
try:
self.resource=import_module('resource')
exceptImportError:
passself.logger.info(f"used memory on start: {str(self.get_virtual_size())}")
yieldscrapy.Request(url='https://speed.hetzner.de/100MB.bin', callback=self.parse)
#yield scrapy.Request(url='http://quotes.toscrape.com', callback=self.parse)defparse(self, response, **kwargs):
self.logger.info(f"used memory after downloading response: {str(self.get_virtual_size())}")
It include:
usage of get_virtual_size method - directly the same as on MemoryUsage extension
Description
Usage of
scrapy.utils.response.response_httpreprinsideDownloaderStatsmiddleware causing application to make unnecessary memory allocation.response_httpreprused only one time - to calculate response sizes fordownloader/response_bytesstats in that middleware.In current implementation
response_httpreprreturnbytes(immutable type) - in order to calculatedownloader/response_bytesapplication will additionally allocate nearly the same memory amount as for original response (only for calculatingleninside middleware).scrapy/scrapy/utils/response.py
Lines 45 to 60 in 26836c4
Steps to Reproduce
In order to demonstrate influence of this i made this spider:
spider code
get_virtual_sizemethod - directly the same as onMemoryUsageextensionDownloaderStatsmiddleware that usesresponse_httpreprandrequest_httprepr[memory_httprepr] used memory on start:61 587 45661 521 92061 558 784[memory_httprepr] used memory after downloading response:375 910 400271 179 77661 558 784Versions
[scrapy.utils.log] Scrapy 2.4.0 started (bot: httprepr)[scrapy.utils.log] Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.2 (default, Apr 23 2020, 14:32:57) - [GCC 8.3.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1h 22 Sep 2020), cryptography 3.1.1, Platform Linux-4.15.0-76-generic-x86_64-with-glibc2.2.5The text was updated successfully, but these errors were encountered: