Skip to main content
Changed text output code formatting
Source Link
holroy
  • 11.8k
  • 1
  • 27
  • 59
en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
....
en Just_to_Satisfy_You_(song) 1
en Just_to_See_Her 2
en Just_to_See_You_Smile 2
en Just_Tricking 1
en Just_Tricking! 1
en Just_Tryin%27_ta_Live 1
en Just_Until... 1
en Just_Us 1
en Justus 2
en Justus_(album) 2
....
en Zsófia_Polgár 1
en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
....
en Just_to_Satisfy_You_(song) 1
en Just_to_See_Her 2
en Just_to_See_You_Smile 2
en Just_Tricking 1
en Just_Tricking! 1
en Just_Tryin%27_ta_Live 1
en Just_Until... 1
en Just_Us 1
en Justus 2
en Justus_(album) 2
....
en Zsófia_Polgár 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1
en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
....
en Just_to_Satisfy_You_(song) 1
en Just_to_See_Her 2
en Just_to_See_You_Smile 2
en Just_Tricking 1
en Just_Tricking! 1
en Just_Tryin%27_ta_Live 1
en Just_Until... 1
en Just_Us 1
en Justus 2
en Justus_(album) 2
....
en Zsófia_Polgár 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1
en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
....
en Just_to_Satisfy_You_(song) 1
en Just_to_See_Her 2
en Just_to_See_You_Smile 2
en Just_Tricking 1
en Just_Tricking! 1
en Just_Tryin%27_ta_Live 1
en Just_Until... 1
en Just_Us 1
en Justus 2
en Justus_(album) 2
....
en Zsófia_Polgár 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1
added 102 characters in body
Source Link
JJack_
  • 233
  • 1
  • 7

But due to a memory issue I'm not able to save the res dictionary. I'm sure that different processes have different address spaces and so all of them write to their own local copy of the dictionary. I'm forced to use Manager to share data between processes, obtaining worst performance. ShouldMay I use queuequeue?

But due to a memory issue I'm not able to save the res dictionary. I'm sure that different processes have different address spaces and so all of them write to their own local copy of the dictionary. I'm forced to use Manager to share data between processes, obtaining worst performance. Should I use queue?

But due to a memory issue I'm not able to save the res dictionary. I'm sure that different processes have different address spaces and so all of them write to their own local copy of the dictionary. I'm forced to use Manager to share data between processes, obtaining worst performance. May I use queue?

added 848 characters in body
Source Link
JJack_
  • 233
  • 1
  • 7

I'm trying to implement an algorithm able to search for multiple keys through ten huge files in Python (16 million of rows each one). I've got a sorted file with 62 million of keys, and I'm trying to scan each of the ten files in the datasetsdataset to look for a set key and itstheir respective value.

This is a follow-up code on feedback from Scanning multiple huge files in Python. All files are encoded with UTF-8 and They should contain multiple language.

Here is parta little slice of my sorted key file:

en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
....
en Just_to_Satisfy_You_(song) 1
en Just_to_See_Her 2
en Just_to_See_You_Smile 2
en Just_Tricking 1
en Just_Tricking! 1
en Just_Tryin%27_ta_Live 1
en Just_Until... 1
en Just_Us 1
en Justus 2
en Justus_(album) 2
....
en Zsófia_Polgár 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1

Here it is the bash script I use to create sorted keys file.

#! /bin/bash
clear
BASEPATH="/home/process"
mkdir processed
mkdir processed/slice
cat $BASEPATH/dataset/* | cut -d' ' -f1,2 | sort -u -k2 > $BASEPATH/processed/sorted_keys
split -d -l 3000000 processed/sorted_keys processed/slice/slice-
for filename in processed/slice/*; do
    python processing.py $filename
done
rm $BASEPATH/processed/sorted_keys
rm -rf $BASEPATH/processed/slice

For each slice I launch processing.py Here is my working code, with Manager:

I'm trying to implement an algorithm able to search for multiple keys through ten huge files in Python (16 million of rows each one). I've got a sorted file with 62 million of keys, and I'm trying to scan each of the ten files in the datasets to look for a key and its value.

This is a follow-up code on feedback from Scanning multiple huge files in Python.

Here is part of my sorted key file:

en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1

Here is my working code, with Manager:

I'm trying to implement an algorithm able to search for multiple keys through ten huge files in Python (16 million of rows each one). I've got a sorted file with 62 million of keys, and I'm trying to scan each of the ten files in the dataset to look for a set key and their respective value.

This is a follow-up code on feedback from Scanning multiple huge files in Python. All files are encoded with UTF-8 and They should contain multiple language.

Here is a little slice of my sorted key file:

en Mahesh_Prasad_Varma
en Mahesh_Saheba
en maheshtala
en Maheshtala_College
en Mahesh_Thakur
en Maheshwara_Institute_Of_Technology
en Maheshwar_Hazari
....
en Just_to_Satisfy_You_(song) 1
en Just_to_See_Her 2
en Just_to_See_You_Smile 2
en Just_Tricking 1
en Just_Tricking! 1
en Just_Tryin%27_ta_Live 1
en Just_Until... 1
en Just_Us 1
en Justus 2
en Justus_(album) 2
....
en Zsófia_Polgár 1
en Mahesh_Prasad_Varma 1
en maheshtala 1
en Maheshtala_College 1
en Maheshwara_Institute_Of_Technology 2
en Maheshwar_Hazari 1

Here it is the bash script I use to create sorted keys file.

#! /bin/bash
clear
BASEPATH="/home/process"
mkdir processed
mkdir processed/slice
cat $BASEPATH/dataset/* | cut -d' ' -f1,2 | sort -u -k2 > $BASEPATH/processed/sorted_keys
split -d -l 3000000 processed/sorted_keys processed/slice/slice-
for filename in processed/slice/*; do
    python processing.py $filename
done
rm $BASEPATH/processed/sorted_keys
rm -rf $BASEPATH/processed/slice

For each slice I launch processing.py Here is my working code, with Manager:

Changed wording a little, hopefully to the better
Source Link
holroy
  • 11.8k
  • 1
  • 27
  • 59
Loading
deleted 167 characters in body; edited tags
Source Link
200_success
  • 145.6k
  • 22
  • 190
  • 479
Loading
added 3 characters in body
Source Link
JJack_
  • 233
  • 1
  • 7
Loading
added 335 characters in body
Source Link
JJack_
  • 233
  • 1
  • 7
Loading
added 335 characters in body
Source Link
JJack_
  • 233
  • 1
  • 7
Loading
Source Link
JJack_
  • 233
  • 1
  • 7
Loading