data-mining

Migrate all Python code from old-fashioned format() functions, formatting % operators and simple concatenations (+) to modern f-strings (brief guide). They are known to be the fastest approach and also increase code readability.

![image](https://user-images.githubusercontent.com/25141164/112898582-a

@gojomo

Not a high-priority at all, but it'd be more sensible for such a tutorial/testing utility corpus to be implemented elsewhere - maybe under /test/ or some other data- or doc- related module – rather than in gensim.models.word2vec.

Originally posted by @gojomo in RaRe-Technologies/gensim#2939 (comment)

Problem: the approximate method can still be slow for many trees
catboost version: master
Operating System: ubuntu 18.04
CPU: i9
GPU: RTX2080

Would be good to be able to specify how many trees to use for shapley. The model.predict and prediction_type versions allow this. lgbm/xgb allow this.

It's been awhile since I updated e2e tests and there are some of them that are filing (most of them are related to examples).

Also, we need to add e2e tests that cover headers and cookies for both drivers.

The official instructions say to use joblib for pickling PyOD models.

This fails for AutoEncoders, or any other TensorFlow-backed model as far as I can tell. The error is:

>>> dump(model, 'model.joblib')
...
TypeError: can't pickle _thread.RLock objects

Note that it's not sufficient to save the underlying Keras S

@MatthewMiddlehurst

Is your feature request related to a problem? Please describe.
NA

Describe the solution you'd like
I thought I'd ask first, before submitting a PR—@MatthewMiddlehurst because it's your code, @kachayev and @RavenRudi because you are working on related PRs—would it be helpful to add [MiniRocket](https://github.com/alan-turing-institute/sktime/blob/main/sktime/transformations/panel/rocke

What's your use case?

I have been using the library for some time to parse my company invoices. I encountered that for my invoices I have line items which can be either of the two format. One way is that I create two templates file for each of it or if there is support for the multiple regex for lines and parser just picks the one for which match has been found.

data-mining

Here are 3,309 public repositories matching this topic...

eriklindernoren / ML-From-Scratch

academic / awesome-datascience

microsoft / LightGBM

RaRe-Technologies / gensim

JaidedAI / EasyOCR

rasbt / python-machine-learning-book

EthicalML / awesome-production-machine-learning

catboost / catboost

MontFerret / ferret

yzhao062 / pyod

jivoi / awesome-ml-for-cybersecurity

yzhao062 / anomaly-detection-resources

alan-turing-institute / sktime

rasbt / mlxtend

tangyudi / Ai-Learn

deanmalmgren / textract

biolab / orange3

r0f1 / datascience

jphall663 / awesome-machine-learning-interpretability

WZBSocialScienceCenter / pdftabextract

rob-med / awesome-TS-anomaly-detection

ankitrohatgi / WebPlotDigitizer

tirthajyoti / Papers-Literature-ML-DL-RL-AI

demidovakatya / vvedenie-mashinnoe-obuchenie

PatMartin / Dex

eBay / tsv-utils

CIRCL / AIL-framework

sepandhaghighi / pycm

404notf0und / AI-for-Security-Learning

invoice-x / invoice2data

Improve this page

Add this topic to your repo