All Questions
586
questions
2
votes
0
answers
1k
views
Zigzag indicator for stock prices
I have read on SO and replicated an indicator for stock prices that works as intended. It's called ZigZag and projects peaks and valleys on historical prices. I pass a pandas dataframe with OHLC ...
1
vote
0
answers
41
views
Generic[type[Enum], Protocol[DataFrame]] Dataset with mapped to enum types
Below is my solution for managing multiple DataFrames, in an abstract enough way that it may apply to objects outside of a pandas.DataFrame hence the ...
6
votes
2
answers
129
views
Get exactly `n` unique randomly sampled rows per category in a Dataframe
I want to get exactly n unique randomly sampled rows per category in a Dataframe. This proved to be involve more steps than the description would lead you to ...
1
vote
1
answer
117
views
dataframe replace (numeric) categorical values by their frequency of label = 1
Here is my dataframe:
data = [['a1','b1',0], ['a2','b3',0], ['a1','b2',1], ['a1','b1',1], ['a2','b3',0]]
df = pd.DataFrame(data=data, columns = ['A','B','label'])
...
4
votes
2
answers
2k
views
Pulling financial data via IEX Cloud API
I am pulling some financial data from the IEX Cloud API. Since it is a paid API w/ rate limits, I would like to do this as efficiently as possible (i.e., as few calls as possible). Given that the <...
1
vote
2
answers
126
views
Replace personal names and addresses with company ones
The problem:
I am given a data frame. Somewhere in that dataframe there is 3*N
number of columns that I need to modify based on a condition. The
columns of interest look like this:
names_1
address_1
...
4
votes
0
answers
151
views
constraint solving graduation using HTML Parsing, pandas, and z3
not sure if this project fits on code review, but my code is getting extremely messy, and would love some tips to clean it up!
Overview
The project is designed to take in an HTML file (a degree audit),...
0
votes
1
answer
84
views
Get a "train or test" slicing elegantly after splitting
I am learning numpy, pandas, and sklearn for machine learning now. My teacher gives me code like below, but I guess it is ugly. This code is for splitting the data into training set and testing set, ...
1
vote
1
answer
35
views
simulated samples for central limit theorem
I am trying to help students visualize the central limit theorem and wanted to do this with simulated data.
I created a population dataset with three variables:
...
0
votes
0
answers
36
views
Collect data from the Betfair API using the betfairlightweight module and create a DataFrame including existing values in each Series
My code uses a Series like the one below to create a final DataFrame adding other values that will be collected after access Betfair API.
Example for row_df:
...
1
vote
1
answer
39
views
Pandas Upsampling Time Series Splitting Equally the values through the weeks starting on monday
I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas".
I am wondering if can be improved the code and if it is right.
...
7
votes
1
answer
2k
views
Calculating T-Test within Large Pandas Dataframes
The below code runs a t-statistic within a large dataframe (rnadf) based on masked values from another dataframe (cnvdf_maked). ...
7
votes
1
answer
278
views
How to make my groupby and transpose operations efficient?
I have a DataFrame of size 3,745,802 rows and 30 columns. I would like to perform certain groupby and ...
7
votes
1
answer
120
views
Code optimisation: Converting dataframe to numpy's ndarray
I am working with a dataframe of over 21M rows.
...
7
votes
1
answer
172
views
Multithreaded HD Image Processing + Logistic reg. Classifier + Visualization
[I'm awaiting suggestions for improvement/optimization/more speed/general feedback ...]
This code takes a label and a folder path of subfolders as input that have certain labels ex: trees, cats with ...
1
vote
1
answer
56
views
Write a Python script to generate a random DataFrame based on specific inputs
I found myself many times in the past trying to generate fake DataFrames in pandas. I decided just for fun, to write a script that I can specify some inputs and ...
8
votes
1
answer
117
views
Using get_dummies to create a Simple Recommender System - Cold Start
Question: was using get_dummies a good choice for converting categorical strings?
I used get_dummies to convert categorical ...
5
votes
2
answers
138
views
Unstructured to Structured TOC
The following code tries to convert an unstructured TOC with bounding box layout data given by the output of pdftotext -bbox-layout -f 11 -l 13 new_book.pdf toc.html...
4
votes
0
answers
139
views
Python BeautifulSoup - preparing HTML rows and td tags for Pandas
I'm using BeautifulSoup to parse a bunch of combined tables' rows, row by row, column by column to prepare it for import into Pandas. I can't use to_html() because ...
7
votes
1
answer
98
views
GUI that reads data and generates/ saves charts
I have a program that uses pandas to read csv files and then generates and saves graphical charts. I have been trying to follow the SOLID principles so I have tried to seperate responsibilities.
So ...
2
votes
2
answers
157
views
Convert a mapping of arbitrary pairs into a one-to-many map
The accepted solution is much cleaner and outperforms the algorithm below for small maps, it doesn't do as well with larger ones: my code takes twice as long as the accepted code for maps of 10 pairs (...
0
votes
1
answer
114
views
Efficient way to read files python - 10 folders with 100k txt files in each one
i am looking for an efficient way to read and append texts of .txt files to a dataframe. I currently have 10 folders with 100k documents each.
What i specifically need to do is:
getting the names of ...
1
vote
1
answer
49
views
Make unique id based on text data column with similarity scoring
I have the following dataframe:
...
0
votes
0
answers
38
views
Find profitable bets from historic results
Each of the lines in my CSV is a possibility of investment that I register on historic, but I would only make the investment if in the existing history (previous lines) the sum of the results is above ...
1
vote
1
answer
37
views
Create new columns in a DataFrame using functions and reposition the new columns
I would like a review regarding the method I use to create the new columns and then reposition them in the correct place where they should be.
The new column called ...
-4
votes
1
answer
30
views
Find characters from same homeworld as Chewbacca [closed]
The problem is
Find the names of all characters which are from the same homeworld as Chewbacca
My code is
...
5
votes
1
answer
116
views
Web scraper for data sources from Statistics Canada
I've written a parser to scrape data from Canadian Statistics Bureau.
...
2
votes
1
answer
91
views
Efficient List comprehension with multiple conditions using shift? [closed]
I am new to python.
I am trying to get the total number of failures by checking first how did the transition of the column Failure Sensor. Then creating the Start column from devicetimestamp if the ...
4
votes
1
answer
147
views
Parsing an IP routing report with half a million lines into a PANDAS dataframe
I have a file that has around 440K lines of data. I need to read these data and find the actual "table" in the text file. Part of the text file looks like this.
...
3
votes
1
answer
46
views
Cleaning Float Column of Longitude
I am cleaning a dataset where columns lat and long are presenting some values multiplied by 10. Not only 10, but changing 10^n. I wrote the code below. I am not sure if it is the best way, but is ...
1
vote
0
answers
34
views
BoundingBox dataclass implementation with cupy, cudf, and nvector
The dataset I'm working with is rather large so I've been experimenting with cudf and cupy. Here you can find instructions for ...
2
votes
1
answer
113
views
python: requests large.zip -> unzip -> fix -> filter ->gunzip
I wrote a function to download a large zipfile 5-7gb from Iowa State MRMS data archive.
The zip files appear to be malformed and results in a BadZipFileError hence ...
3
votes
2
answers
172
views
Intercolumn statistics between columns in a dataframe
I have a df and need to count how many adjacent columns have the same sign as other columns based on the sign of the first column, and multiply by the sign of the ...
1
vote
1
answer
55
views
Create charts after querying database
I'm at the end of the IBM Data Analyst course, and I wanted to ask for a rating of a piece of code I wrote as a solution to its exercises from the final chapter. I know I could write it on the forum ...
2
votes
2
answers
122
views
Pivoting and then Padding a Pandas DataFrame with NaN between specific columns - Case Study
This question is about pivoting and padding columns, two very frequent activities in Pandas.
I have a raw dataframe. I need to manipulate from long to ...
1
vote
1
answer
38
views
Plot time windows based on interested value
I use the following code to identify some interested values into a dataframe and them plot a time window before and after that value appeared. It works very well, but I would like to know if there is ...
3
votes
1
answer
114
views
Python : adding a columns with count of missing values by row
I have a big python data-frame and I am trying to add a column to it with average number of missing values by row. I have inherited some code that is working but I'd like to reduce memory usage by ...
1
vote
1
answer
66
views
Grouping and summing only n variables of m with m>n using a column as key in pandas
I have the following df
...
2
votes
0
answers
72
views
Do Mann-Kendall and Pettitt tests on each CSV file
Here is a function that will take each text file in a directory, do the Mann-Kendall and Pettitt tests, and then write the output to a text file. Would you please suggest me improve the code to make ...
1
vote
1
answer
136
views
Iterate and assign weights based on two columns (python)
FI_name
ISN
Sector
Industry
REC
INE02
PS
FS
HDB
INE03
PR
FS
ABC
INE04
PR
FS
RHC
INE05
PR
CO
ZHE
INE06
PR
FS
HSE
INE07
PR
FS
ZAK
INE08
PS
MT
HGB
INE09
PR
FS
YUJ
INE10
PR
MT
WSD
INE11
PS
FS
...
1
vote
1
answer
60
views
Testing string membership using (in) keyword in python is very slow
I have the following text dataset:
4 million paragraphs of length between (10-60 words each).
...
2
votes
1
answer
38
views
Obtaining error code information that occurs before, during, and after a fix/repair using date data
I have completed a project I was working on using the methods I know how, but it is very inefficient. I am a beginner trying to figure out how I can improve my work by using software solutions.
I have ...
2
votes
1
answer
75
views
Remove duplicates from a Pandas dataframe taking into account lowercase letters and accents
I have the following DataFrame in pandas:
code
town
district
suburb
02
Benalmádena
Málaga
Arroyo de la Miel
03
Alicante
Jacarilla
Jacarilla, Correntias Bajas (Jacarilla)
04
Cabrera d'Anoia
...
2
votes
1
answer
91
views
Slow processing of a python dataframe when aggregating across rows and columns
I would do this in SQL using string_agg but the server is SQL Server 2012 and beyond my control. So I'm trying a python approach.
I have a dataframe of shape [20225 rows x 7 columns], and there a bit ...
0
votes
1
answer
70
views
Performance issue on updating a Pandas DataFrame with Series based on DateRange
I have two Pandas data frames: one with Daily data and one with Weekly data.
I want to add the weekly data to each row of the daily data for each group of column A.
For example, for each row on the ...
1
vote
0
answers
34
views
Iterate tables from table id from href links until no table with specific table id is found
I am doing web scraping to the next web page (which is my root URL to start scraping tables): https://www.iso.org/standards-catalogue/browse-by-ics.html
What I am trying to achieve is to parse the ...
0
votes
1
answer
36
views
Treat a list by generating a dataframe and sending the data to function via multiprocessing
To collect the list with the data from an API, I need to do these steps:
...
5
votes
4
answers
12k
views
Preprocessing steps to follow while cleaning and extracting text data from tweets
I have a dataset of around 200,000 tweets. I am running a classification task on them. Dataset has two columns - class label and the tweet text. In the preprocessing step I am passing the dataset ...
-1
votes
1
answer
61
views
Forecast experimental results based on temperature [closed]
I would really welcome any hints on making this code more concise please.
In this example we have some experiment results.
For each planned experiment We have predicted results based on 4 temperatures ...
3
votes
1
answer
107
views
Crawler/scraper for soccer match results
I wrote this code some time ago as part of the web-scraping learning. Every now and then I find mistakes in it, as well as I have doubts. Feedback please, is this code compliant with the common best ...