Recently Active 'pandas+python' Questions

2 votes

0 answers

1k views

Zigzag indicator for stock prices

I have read on SO and replicated an indicator for stock prices that works as intended. It's called ZigZag and projects peaks and valleys on historical prices. I pass a pandas dataframe with OHLC ...

Denis Zhuravlev

1

modified Mar 15 at 20:24

1 vote

0 answers

41 views

Generic[type[Enum], Protocol[DataFrame]] Dataset with mapped to enum types

Below is my solution for managing multiple DataFrames, in an abstract enough way that it may apply to objects outside of a pandas.DataFrame hence the ...

Jason Leaver

628

asked Mar 6 at 11:57

6 votes

2 answers

129 views

Get exactly `n` unique randomly sampled rows per category in a Dataframe

I want to get exactly n unique randomly sampled rows per category in a Dataframe. This proved to be involve more steps than the description would lead you to ...

CommunityBot

1

modified Mar 2 at 1:10

1 vote

1 answer

117 views

dataframe replace (numeric) categorical values by their frequency of label = 1

Here is my dataframe: data = [['a1','b1',0], ['a2','b3',0], ['a1','b2',1], ['a1','b1',1], ['a2','b3',0]] df = pd.DataFrame(data=data, columns = ['A','B','label']) ...

CommunityBot

1

modified Mar 1 at 18:05

4 votes

2 answers

2k views

Pulling financial data via IEX Cloud API

I am pulling some financial data from the IEX Cloud API. Since it is a paid API w/ rate limits, I would like to do this as efficiently as possible (i.e., as few calls as possible). Given that the <...

pacmaninbw

21.8k

modified Feb 26 at 17:29

1 vote

2 answers

126 views

Replace personal names and addresses with company ones

The problem: I am given a data frame. Somewhere in that dataframe there is 3*N number of columns that I need to modify based on a condition. The columns of interest look like this: names_1 address_1 ...

Reinderien

55.5k

modified Feb 20 at 15:33

4 votes

0 answers

151 views

constraint solving graduation using HTML Parsing, pandas, and z3

not sure if this project fits on code review, but my code is getting extremely messy, and would love some tips to clean it up! Overview The project is designed to take in an HTML file (a degree audit),...

retep

169

modified Feb 16 at 2:08

0 votes

1 answer

84 views

Get a "train or test" slicing elegantly after splitting

I am learning numpy, pandas, and sklearn for machine learning now. My teacher gives me code like below, but I guess it is ugly. This code is for splitting the data into training set and testing set, ...

CommunityBot

1

modified Feb 14 at 4:07

1 vote

1 answer

35 views

simulated samples for central limit theorem

I am trying to help students visualize the central limit theorem and wanted to do this with simulated data. I created a population dataset with three variables: ...

Sᴀᴍ Onᴇᴌᴀ♦

26.4k

modified Feb 10 at 8:21

0 votes

0 answers

36 views

Collect data from the Betfair API using the betfairlightweight module and create a DataFrame including existing values in each Series

My code uses a Series like the one below to create a final DataFrame adding other values that will be collected after access Betfair API. Example for row_df: ...

Digital Farmer

193

asked Jan 26 at 23:25

1 vote

1 answer

39 views

Pandas Upsampling Time Series Splitting Equally the values through the weeks starting on monday

I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas". I am wondering if can be improved the code and if it is right. ...

Reinderien

55.5k

answered Jan 22 at 4:22

7 votes

1 answer

2k views

Calculating T-Test within Large Pandas Dataframes

The below code runs a t-statistic within a large dataframe (rnadf) based on masked values from another dataframe (cnvdf_maked). ...

Toby Speight

67.5k

modified Jan 8 at 10:32

7 votes

1 answer

278 views

How to make my groupby and transpose operations efficient?

I have a DataFrame of size 3,745,802 rows and 30 columns. I would like to perform certain groupby and ...

J_H

8,697

answered Jan 6 at 19:57

7 votes

1 answer

120 views

Code optimisation: Converting dataframe to numpy's ndarray

I am working with a dataframe of over 21M rows. ...

J_H

8,697

answered Jan 6 at 19:39

7 votes

1 answer

172 views

Multithreaded HD Image Processing + Logistic reg. Classifier + Visualization

[I'm awaiting suggestions for improvement/optimization/more speed/general feedback ...] This code takes a label and a folder path of subfolders as input that have certain labels ex: trees, cats with ...

J_H

8,697

modified Jan 6 at 0:51

1 vote

1 answer

56 views

Write a Python script to generate a random DataFrame based on specific inputs

I found myself many times in the past trying to generate fake DataFrames in pandas. I decided just for fun, to write a script that I can specify some inputs and ...

Reinderien

55.5k

modified Jan 5 at 13:23

8 votes

1 answer

117 views

Using get_dummies to create a Simple Recommender System - Cold Start

Question: was using get_dummies a good choice for converting categorical strings? I used get_dummies to convert categorical ...

Toby Speight

67.5k

modified Jan 5 at 9:14

5 votes

2 answers

138 views

Unstructured to Structured TOC

The following code tries to convert an unstructured TOC with bounding box layout data given by the output of pdftotext -bbox-layout -f 11 -l 13 new_book.pdf toc.html...

Sati

435

modified Jan 1 at 6:59

4 votes

0 answers

139 views

Python BeautifulSoup - preparing HTML rows and td tags for Pandas

I'm using BeautifulSoup to parse a bunch of combined tables' rows, row by row, column by column to prepare it for import into Pandas. I can't use to_html() because ...

Meghan M.

41

modified Dec 31, 2022 at 10:58

7 votes

1 answer

98 views

GUI that reads data and generates/ saves charts

I have a program that uses pandas to read csv files and then generates and saves graphical charts. I have been trying to follow the SOLID principles so I have tried to seperate responsibilities. So ...

CommunityBot

1

modified Dec 27, 2022 at 6:06

2 votes

2 answers

157 views

Convert a mapping of arbitrary pairs into a one-to-many map

The accepted solution is much cleaner and outperforms the algorithm below for small maps, it doesn't do as well with larger ones: my code takes twice as long as the accepted code for maps of 10 pairs (...

shortorian

133

modified Dec 22, 2022 at 1:58

0 votes

1 answer

114 views

Efficient way to read files python - 10 folders with 100k txt files in each one

i am looking for an efficient way to read and append texts of .txt files to a dataframe. I currently have 10 folders with 100k documents each. What i specifically need to do is: getting the names of ...

Juho

3,569

answered Dec 14, 2022 at 20:48

1 vote

1 answer

49 views

Make unique id based on text data column with similarity scoring

I have the following dataframe: ...

AlekseyHoffman

111

answered Dec 13, 2022 at 15:11

0 votes

0 answers

38 views

Find profitable bets from historic results

Each of the lines in my CSV is a possibility of investment that I register on historic, but I would only make the investment if in the existing history (previous lines) the sum of the results is above ...

Toby Speight

67.5k

modified Dec 9, 2022 at 21:35

1 vote

1 answer

37 views

Create new columns in a DataFrame using functions and reposition the new columns

I would like a review regarding the method I use to create the new columns and then reposition them in the correct place where they should be. The new column called ...

Juho

3,569

modified Dec 9, 2022 at 19:08

-4 votes

1 answer

30 views

Find characters from same homeworld as Chewbacca [closed]

The problem is Find the names of all characters which are from the same homeworld as Chewbacca My code is ...

Toby Speight

67.5k

modified Nov 29, 2022 at 7:57

5 votes

1 answer

116 views

Web scraper for data sources from Statistics Canada

I've written a parser to scrape data from Canadian Statistics Bureau. ...

scnerd

2,020

answered Oct 5, 2022 at 20:32

2 votes

1 answer

91 views

Efficient List comprehension with multiple conditions using shift? [closed]

I am new to python. I am trying to get the total number of failures by checking first how did the transition of the column Failure Sensor. Then creating the Start column from devicetimestamp if the ...

Reinderien

55.5k

answered Sep 22, 2022 at 11:51

4 votes

1 answer

147 views

Parsing an IP routing report with half a million lines into a PANDAS dataframe

I have a file that has around 440K lines of data. I need to read these data and find the actual "table" in the text file. Part of the text file looks like this. ...

CommunityBot

1

modified Sep 10, 2022 at 4:03

3 votes

1 answer

46 views

Cleaning Float Column of Longitude

I am cleaning a dataset where columns lat and long are presenting some values multiplied by 10. Not only 10, but changing 10^n. I wrote the code below. I am not sure if it is the best way, but is ...

Toby Speight

67.5k

answered Sep 9, 2022 at 8:32

1 vote

0 answers

34 views

BoundingBox dataclass implementation with cupy, cudf, and nvector

The dataset I'm working with is rather large so I've been experimenting with cudf and cupy. Here you can find instructions for ...

Jason Leaver

628

modified Sep 2, 2022 at 17:10

2 votes

1 answer

113 views

python: requests large.zip -> unzip -> fix -> filter ->gunzip

I wrote a function to download a large zipfile 5-7gb from Iowa State MRMS data archive. The zip files appear to be malformed and results in a BadZipFileError hence ...

Reinderien

55.5k

answered Aug 27, 2022 at 2:40

3 votes

2 answers

172 views

Intercolumn statistics between columns in a dataframe

I have a df and need to count how many adjacent columns have the same sign as other columns based on the sign of the first column, and multiply by the sign of the ...

Toby Speight

67.5k

modified Aug 25, 2022 at 10:11

1 vote

1 answer

55 views

Create charts after querying database

I'm at the end of the IBM Data Analyst course, and I wanted to ask for a rating of a piece of code I wrote as a solution to its exercises from the final chapter. I know I could write it on the forum ...

Reinderien

55.5k

answered Aug 16, 2022 at 1:09

2 votes

2 answers

122 views

Pivoting and then Padding a Pandas DataFrame with NaN between specific columns - Case Study

This question is about pivoting and padding columns, two very frequent activities in Pandas. I have a raw dataframe. I need to manipulate from long to ...

Reinderien

55.5k

modified Aug 11, 2022 at 13:25

1 vote

1 answer

38 views

Plot time windows based on interested value

I use the following code to identify some interested values into a dataframe and them plot a time window before and after that value appeared. It works very well, but I would like to know if there is ...

Reinderien

55.5k

answered Aug 7, 2022 at 0:35

3 votes

1 answer

114 views

Python : adding a columns with count of missing values by row

I have a big python data-frame and I am trying to add a column to it with average number of missing values by row. I have inherited some code that is working but I'd like to reduce memory usage by ...

CommunityBot

1

modified Jul 30, 2022 at 20:03

1 vote

1 answer

66 views

Grouping and summing only n variables of m with m>n using a column as key in pandas

I have the following df ...

Reinderien

55.5k

modified Jul 30, 2022 at 15:02

2 votes

0 answers

72 views

Do Mann-Kendall and Pettitt tests on each CSV file

Here is a function that will take each text file in a directory, do the Mann-Kendall and Pettitt tests, and then write the output to a text file. Would you please suggest me improve the code to make ...

200_success

143k

modified Jul 28, 2022 at 18:11

1 vote

1 answer

136 views

Iterate and assign weights based on two columns (python)

FI_name ISN Sector Industry REC INE02 PS FS HDB INE03 PR FS ABC INE04 PR FS RHC INE05 PR CO ZHE INE06 PR FS HSE INE07 PR FS ZAK INE08 PS MT HGB INE09 PR FS YUJ INE10 PR MT WSD INE11 PS FS ...

Reinderien

55.5k

answered Jul 21, 2022 at 2:03

1 vote

1 answer

60 views

Testing string membership using (in) keyword in python is very slow

I have the following text dataset: 4 million paragraphs of length between (10-60 words each). ...

RootTwo

9,505

answered Jul 18, 2022 at 6:44

2 votes

1 answer

38 views

Obtaining error code information that occurs before, during, and after a fix/repair using date data

I have completed a project I was working on using the methods I know how, but it is very inefficient. I am a beginner trying to figure out how I can improve my work by using software solutions. I have ...

Reinderien

55.5k

answered Jul 17, 2022 at 0:57

2 votes

1 answer

75 views

Remove duplicates from a Pandas dataframe taking into account lowercase letters and accents

I have the following DataFrame in pandas: code town district suburb 02 Benalmádena Málaga Arroyo de la Miel 03 Alicante Jacarilla Jacarilla, Correntias Bajas (Jacarilla) 04 Cabrera d'Anoia ...

AJNeufeld

32.9k

modified Jul 15, 2022 at 16:27

2 votes

1 answer

91 views

Slow processing of a python dataframe when aggregating across rows and columns

I would do this in SQL using string_agg but the server is SQL Server 2012 and beyond my control. So I'm trying a python approach. I have a dataframe of shape [20225 rows x 7 columns], and there a bit ...

Reinderien

55.5k

answered Jul 15, 2022 at 0:24

0 votes

1 answer

70 views

Performance issue on updating a Pandas DataFrame with Series based on DateRange

I have two Pandas data frames: one with Daily data and one with Weekly data. I want to add the weekly data to each row of the daily data for each group of column A. For example, for each row on the ...

Reinderien

55.5k

answered Jul 10, 2022 at 2:00

1 vote

0 answers

34 views

Iterate tables from table id from href links until no table with specific table id is found

I am doing web scraping to the next web page (which is my root URL to start scraping tables): https://www.iso.org/standards-catalogue/browse-by-ics.html What I am trying to achieve is to parse the ...

Gescof

11

asked Jul 5, 2022 at 13:15

0 votes

1 answer

36 views

Treat a list by generating a dataframe and sending the data to function via multiprocessing

To collect the list with the data from an API, I need to do these steps: ...

Digital Farmer

193

modified Jul 3, 2022 at 16:08

5 votes

4 answers

12k views

Preprocessing steps to follow while cleaning and extracting text data from tweets

I have a dataset of around 200,000 tweets. I am running a classification task on them. Dataset has two columns - class label and the tweet text. In the preprocessing step I am passing the dataset ...

Ravindra S

161

modified Jun 26, 2022 at 18:45

-1 votes

1 answer

61 views

Forecast experimental results based on temperature [closed]

I would really welcome any hints on making this code more concise please. In this example we have some experiment results. For each planned experiment We have predicted results based on 4 temperatures ...

Newbie

3

modified Jun 26, 2022 at 13:46

3 votes

1 answer

107 views

Crawler/scraper for soccer match results

I wrote this code some time ago as part of the web-scraping learning. Every now and then I find mistakes in it, as well as I have doubts. Feedback please, is this code compliant with the common best ...

Kate

5,772

answered Jun 18, 2022 at 23:08

All Questions

Related Tags