dask

I just ran into an issue when trying to use to_csv with distributed workers that don't share a file system. I shouldn't have been surprised that writing to a local file system from a distributed worker doesn't work. It shouldn't work. But the error I got was just a File Not Found error. That brought me to:dask/dask#2656 (comment) - which was the answer.

The stumpy.snippets feature is now completed in #283 which follows this work:

We have a rough notebook t

What happened:

When creating a LocalCluster object the comm is started on a random high port, even if there are no other clusters running.

What you expected to happen:

Should use port 8786.

Minimal Complete Verifiable Example:

$ conda create -n dask-lc-test -c conda-forge -y python=3.8 ipython dask distributed
$ conda activate dask-lc-test

The `d

Describe the bug
According to the multiscene documentation, the property all_same_area does:

Determine if all contained Scenes have the same ‘area’.

However, I have created a multiscene where all scenes have the same area (they just differ between datasets), yet the property returns Fa

Code Sample, a minimal, complete, and verifiable piece of code

from pyresample.boundary import Boundary
b = Boundary(my_lons, my_lats)
print(b.contour_poly.area())

Problem description

The above code doesn't fail if the provided lons/lats are 2D (not sure on 3D+), but the class and all functions/utilities underneath it assume 1D arrays. The end results are incor

from dask_jobqueue import SLURMCluster 
cluster = SLURMCluster(cores=1, memory='1GB') 
print(cluster.job_script())

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=954M
#SBATCH -t 00:30:00

/home/lesteve/miniconda3/bin/python -m distributed.cli.dask_worker tcp://192.168.0.11:44065 --nthreads 1 --memory-limit 1000.00MB -

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data

@romainr

The ML implementation is still a bit experimental - we can improve on this:

SHOW MODELS and DESCRIBE MODEL
Hyperparameter optimizations, AutoML-like behaviour
@romainr brought up the idea of exporting models
and some more showcases and examples

Example for numerical weather prediction

to be added to initialised datasets

Data sources (to) implement(ed):

relates to #600

Currently all of the metrics computed are independent of a target variable or column, but if lens.summarise took the name of a column as the target variable, the output of some metrics could be more interpretable even if the target variable is not used in any kind of predictive modelling.

A good example of this could be PCA (see #14), which could plot the different categories of the target va

dask

Here are 221 public repositories matching this topic...

dask / dask

pydata / xarray

TDAmeritrade / stumpy

jmcarpenter2 / swifter

dask / distributed

ironmussa / Optimus

itamarst / eliot

pytroll / satpy

ranaroussi / pystore

timkpaine / paperboy

JiaweiZhuang / xESMF

pytroll / pyresample

Code Sample, a minimal, complete, and verifiable piece of code

Problem description

dask / dask-jobqueue

JDASoftwareGroup / kartothek

Problem description

Example code (ideally copy-pastable)

nils-braun / dask-sql

dask / dask-ec2

ironmussa / Bumblebee

pangeo-data / climpred

facultyai / lens

dymaxionlabs / dask-rasterio

LDO-CERT / orochi

fugue-project / fugue

chmp / framequery

dask / knit

NCAR / ncar-python-tutorial

JSybrandt / agatha

radix-ai / graphchain

backtick-se / cowait

MITgcm / xmitgcm

dgerlanc / dask-scaling-dataframe

Improve this page

Add this topic to your repo