All Questions

Tagged with
Filter by
Sorted by
Tagged with
2 votes
0 answers
1k views

Zigzag indicator for stock prices

I have read on SO and replicated an indicator for stock prices that works as intended. It's called ZigZag and projects peaks and valleys on historical prices. I pass a pandas dataframe with OHLC ...
1 vote
0 answers
41 views

Generic[type[Enum], Protocol[DataFrame]] Dataset with mapped to enum types

Below is my solution for managing multiple DataFrames, in an abstract enough way that it may apply to objects outside of a pandas.DataFrame hence the ...
6 votes
2 answers
129 views

Get exactly `n` unique randomly sampled rows per category in a Dataframe

I want to get exactly n unique randomly sampled rows per category in a Dataframe. This proved to be involve more steps than the description would lead you to ...
1 vote
1 answer
117 views

dataframe replace (numeric) categorical values by their frequency of label = 1

Here is my dataframe: data = [['a1','b1',0], ['a2','b3',0], ['a1','b2',1], ['a1','b1',1], ['a2','b3',0]] df = pd.DataFrame(data=data, columns = ['A','B','label']) ...
4 votes
2 answers
2k views

Pulling financial data via IEX Cloud API

I am pulling some financial data from the IEX Cloud API. Since it is a paid API w/ rate limits, I would like to do this as efficiently as possible (i.e., as few calls as possible). Given that the <...
1 vote
2 answers
126 views

Replace personal names and addresses with company ones

The problem: I am given a data frame. Somewhere in that dataframe there is 3*N number of columns that I need to modify based on a condition. The columns of interest look like this: names_1 address_1 ...
4 votes
0 answers
151 views

constraint solving graduation using HTML Parsing, pandas, and z3

not sure if this project fits on code review, but my code is getting extremely messy, and would love some tips to clean it up! Overview The project is designed to take in an HTML file (a degree audit),...
0 votes
1 answer
84 views

Get a "train or test" slicing elegantly after splitting

I am learning numpy, pandas, and sklearn for machine learning now. My teacher gives me code like below, but I guess it is ugly. This code is for splitting the data into training set and testing set, ...
1 vote
1 answer
35 views

simulated samples for central limit theorem

I am trying to help students visualize the central limit theorem and wanted to do this with simulated data. I created a population dataset with three variables: ...
0 votes
0 answers
36 views

Collect data from the Betfair API using the betfairlightweight module and create a DataFrame including existing values in each Series

My code uses a Series like the one below to create a final DataFrame adding other values that will be collected after access Betfair API. Example for row_df: ...
1 vote
1 answer
39 views

Pandas Upsampling Time Series Splitting Equally the values through the weeks starting on monday

I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas". I am wondering if can be improved the code and if it is right. ...
7 votes
1 answer
2k views

Calculating T-Test within Large Pandas Dataframes

The below code runs a t-statistic within a large dataframe (rnadf) based on masked values from another dataframe (cnvdf_maked). ...
7 votes
1 answer
278 views

How to make my groupby and transpose operations efficient?

I have a DataFrame of size 3,745,802 rows and 30 columns. I would like to perform certain groupby and ...
7 votes
1 answer
120 views

Code optimisation: Converting dataframe to numpy's ndarray

I am working with a dataframe of over 21M rows. ...
7 votes
1 answer
172 views

Multithreaded HD Image Processing + Logistic reg. Classifier + Visualization

[I'm awaiting suggestions for improvement/optimization/more speed/general feedback ...] This code takes a label and a folder path of subfolders as input that have certain labels ex: trees, cats with ...
1 vote
1 answer
56 views

Write a Python script to generate a random DataFrame based on specific inputs

I found myself many times in the past trying to generate fake DataFrames in pandas. I decided just for fun, to write a script that I can specify some inputs and ...
8 votes
1 answer
117 views

Using get_dummies to create a Simple Recommender System - Cold Start

Question: was using get_dummies a good choice for converting categorical strings? I used get_dummies to convert categorical ...
5 votes
2 answers
138 views

Unstructured to Structured TOC

The following code tries to convert an unstructured TOC with bounding box layout data given by the output of pdftotext -bbox-layout -f 11 -l 13 new_book.pdf toc.html...
4 votes
0 answers
139 views

Python BeautifulSoup - preparing HTML rows and td tags for Pandas

I'm using BeautifulSoup to parse a bunch of combined tables' rows, row by row, column by column to prepare it for import into Pandas. I can't use to_html() because ...
7 votes
1 answer
98 views

GUI that reads data and generates/ saves charts

I have a program that uses pandas to read csv files and then generates and saves graphical charts. I have been trying to follow the SOLID principles so I have tried to seperate responsibilities. So ...
2 votes
2 answers
157 views

Convert a mapping of arbitrary pairs into a one-to-many map

The accepted solution is much cleaner and outperforms the algorithm below for small maps, it doesn't do as well with larger ones: my code takes twice as long as the accepted code for maps of 10 pairs (...
0 votes
1 answer
114 views

Efficient way to read files python - 10 folders with 100k txt files in each one

i am looking for an efficient way to read and append texts of .txt files to a dataframe. I currently have 10 folders with 100k documents each. What i specifically need to do is: getting the names of ...
1 vote
1 answer
49 views

Make unique id based on text data column with similarity scoring

I have the following dataframe: ...
0 votes
0 answers
38 views

Find profitable bets from historic results

Each of the lines in my CSV is a possibility of investment that I register on historic, but I would only make the investment if in the existing history (previous lines) the sum of the results is above ...
1 vote
1 answer
37 views

Create new columns in a DataFrame using functions and reposition the new columns

I would like a review regarding the method I use to create the new columns and then reposition them in the correct place where they should be. The new column called ...
-4 votes
1 answer
30 views

Find characters from same homeworld as Chewbacca [closed]

The problem is Find the names of all characters which are from the same homeworld as Chewbacca My code is ...
5 votes
1 answer
116 views

Web scraper for data sources from Statistics Canada

I've written a parser to scrape data from Canadian Statistics Bureau. ...
2 votes
1 answer
91 views

Efficient List comprehension with multiple conditions using shift? [closed]

I am new to python. I am trying to get the total number of failures by checking first how did the transition of the column Failure Sensor. Then creating the Start column from devicetimestamp if the ...
4 votes
1 answer
147 views

Parsing an IP routing report with half a million lines into a PANDAS dataframe

I have a file that has around 440K lines of data. I need to read these data and find the actual "table" in the text file. Part of the text file looks like this. ...
3 votes
1 answer
46 views

Cleaning Float Column of Longitude

I am cleaning a dataset where columns lat and long are presenting some values multiplied by 10. Not only 10, but changing 10^n. I wrote the code below. I am not sure if it is the best way, but is ...
1 vote
0 answers
34 views

BoundingBox dataclass implementation with cupy, cudf, and nvector

The dataset I'm working with is rather large so I've been experimenting with cudf and cupy. Here you can find instructions for ...
2 votes
1 answer
113 views

python: requests large.zip -> unzip -> fix -> filter ->gunzip

I wrote a function to download a large zipfile 5-7gb from Iowa State MRMS data archive. The zip files appear to be malformed and results in a BadZipFileError hence ...
3 votes
2 answers
172 views

Intercolumn statistics between columns in a dataframe

I have a df and need to count how many adjacent columns have the same sign as other columns based on the sign of the first column, and multiply by the sign of the ...
1 vote
1 answer
55 views

Create charts after querying database

I'm at the end of the IBM Data Analyst course, and I wanted to ask for a rating of a piece of code I wrote as a solution to its exercises from the final chapter. I know I could write it on the forum ...
2 votes
2 answers
122 views

Pivoting and then Padding a Pandas DataFrame with NaN between specific columns - Case Study

This question is about pivoting and padding columns, two very frequent activities in Pandas. I have a raw dataframe. I need to manipulate from long to ...
1 vote
1 answer
38 views

Plot time windows based on interested value

I use the following code to identify some interested values into a dataframe and them plot a time window before and after that value appeared. It works very well, but I would like to know if there is ...
3 votes
1 answer
114 views

Python : adding a columns with count of missing values by row

I have a big python data-frame and I am trying to add a column to it with average number of missing values by row. I have inherited some code that is working but I'd like to reduce memory usage by ...
1 vote
1 answer
66 views

Grouping and summing only n variables of m with m>n using a column as key in pandas

I have the following df ...
2 votes
0 answers
72 views

Do Mann-Kendall and Pettitt tests on each CSV file

Here is a function that will take each text file in a directory, do the Mann-Kendall and Pettitt tests, and then write the output to a text file. Would you please suggest me improve the code to make ...
1 vote
1 answer
136 views

Iterate and assign weights based on two columns (python)

FI_name ISN Sector Industry REC INE02 PS FS HDB INE03 PR FS ABC INE04 PR FS RHC INE05 PR CO ZHE INE06 PR FS HSE INE07 PR FS ZAK INE08 PS MT HGB INE09 PR FS YUJ INE10 PR MT WSD INE11 PS FS ...
1 vote
1 answer
60 views

Testing string membership using (in) keyword in python is very slow

I have the following text dataset: 4 million paragraphs of length between (10-60 words each). ...
2 votes
1 answer
38 views

Obtaining error code information that occurs before, during, and after a fix/repair using date data

I have completed a project I was working on using the methods I know how, but it is very inefficient. I am a beginner trying to figure out how I can improve my work by using software solutions. I have ...
2 votes
1 answer
75 views

Remove duplicates from a Pandas dataframe taking into account lowercase letters and accents

I have the following DataFrame in pandas: code town district suburb 02 Benalmádena Málaga Arroyo de la Miel 03 Alicante Jacarilla Jacarilla, Correntias Bajas (Jacarilla) 04 Cabrera d'Anoia ...
2 votes
1 answer
91 views

Slow processing of a python dataframe when aggregating across rows and columns

I would do this in SQL using string_agg but the server is SQL Server 2012 and beyond my control. So I'm trying a python approach. I have a dataframe of shape [20225 rows x 7 columns], and there a bit ...
0 votes
1 answer
70 views

Performance issue on updating a Pandas DataFrame with Series based on DateRange

I have two Pandas data frames: one with Daily data and one with Weekly data. I want to add the weekly data to each row of the daily data for each group of column A. For example, for each row on the ...
1 vote
0 answers
34 views

Iterate tables from table id from href links until no table with specific table id is found

I am doing web scraping to the next web page (which is my root URL to start scraping tables): https://www.iso.org/standards-catalogue/browse-by-ics.html What I am trying to achieve is to parse the ...
0 votes
1 answer
36 views

Treat a list by generating a dataframe and sending the data to function via multiprocessing

To collect the list with the data from an API, I need to do these steps: ...
5 votes
4 answers
12k views

Preprocessing steps to follow while cleaning and extracting text data from tweets

I have a dataset of around 200,000 tweets. I am running a classification task on them. Dataset has two columns - class label and the tweet text. In the preprocessing step I am passing the dataset ...
-1 votes
1 answer
61 views

Forecast experimental results based on temperature [closed]

I would really welcome any hints on making this code more concise please. In this example we have some experiment results. For each planned experiment We have predicted results based on 4 temperatures ...
3 votes
1 answer
107 views

Crawler/scraper for soccer match results

I wrote this code some time ago as part of the web-scraping learning. Every now and then I find mistakes in it, as well as I have doubts. Feedback please, is this code compliant with the common best ...

1
2 3 4 5
12