-
Updated
May 16, 2022 - Java
data-catalog
Here are 75 public repositories matching this topic...
I keep running into situations where I want to run Marquez alongside Airflow (or another system that also runs a Postgres database) and the two docker-compose environments both attempt to start a Postgres database. The "correct" workaround is to create a single docker-compose environment with a shared database for both systems, but this is not always ideal.
It would be nice if we could tell `do
Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.
Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:
import requests
import json
# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCo-
Updated
May 9, 2022 - Python
@cantzakas created the SQL query necessary to pull metadata in (hyperqueryhq/whale#140) -- we just have to make the greenplum extractor scaffolding. This should just follow the exact same shape as the Postgres extractor.
It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.
Add more logging in all modules to emit debug signals for improved logging.
Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).
Zarr does not like that type of metadata:
import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None
ds_test.to_zarr('test.zarr')gives
------------------------
-
Updated
May 13, 2020 - HTML
-
Updated
Jan 31, 2022
-
Updated
May 14, 2022 - Python
-
Updated
Dec 6, 2021 - Python
-
Updated
Jul 9, 2021 - HTML
-
Updated
Mar 30, 2022 - PHP
-
Updated
Feb 12, 2022 - Python
-
Updated
Mar 29, 2022 - JavaScript
-
pattern= catalog : dataset name : url : comment
-
ocean: World Ocean Atlas: https://www.nodc.noaa.gov/OC5/woa18/ : different versions and variables via parameter #15
-
global carbon budget with https://github.com/edjdavid/intake-excel #22
-
land: precipitation: https://psl.noaa.gov/data/gridded/tables/precipitation.html:
-
Mauna Loa CO2 netcdf ftp://aftp.cmdl.noaa.go
Promoted from related-sciences/articat#5
-
Updated
Apr 12, 2022 - Java
-
Updated
May 16, 2022 - TypeScript
-
Updated
Jul 21, 2021 - Java
There's a reasonable chunk of work to do regarding migrating the Types documentation. The tasks would involve:
- Saving the individual diagrams in each page as a separate SVG file (per documentation guide's instructions)
- Updating the various pages to point to these SVG files instead of the PNG files
- Ensuring that every type (entity, relationship or classification) being expla
-
Updated
Dec 1, 2021 - Shell
-
Updated
May 14, 2022 - Python
-
Updated
Nov 11, 2021 - Java
-
Updated
Sep 11, 2020
-
Updated
May 12, 2022 - C#
-
Updated
Jan 19, 2021 - Python
Improve this page
Add a description, image, and links to the data-catalog topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-catalog topic, visit your repo's landing page and select "manage topics."
Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.
There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t