Skip to content
#

data-catalog

Here are 75 public repositories matching this topic...

feng-tao
feng-tao commented May 14, 2021

Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.

There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t

help wanted good first issue keep fresh
rossturk
rossturk commented Apr 19, 2022

I keep running into situations where I want to run Marquez alongside Airflow (or another system that also runs a Postgres database) and the two docker-compose environments both attempt to start a Postgres database. The "correct" workaround is to create a single docker-compose environment with a shared database for both systems, but this is not always ideal.

It would be nice if we could tell `do

good first issue feature
pmbrull
pmbrull commented Apr 4, 2022

Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.

Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:

import requests
import json

# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCo
good first issue client
vrajat
vrajat commented Feb 14, 2020

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

good first issue
jbusecke
jbusecke commented Feb 18, 2021

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

------------------------
good first issue bug

National Data Archive (NADA) is an open source data cataloging system that serves as a portal for researchers to browse, search, compare, apply for access, and download relevant census or survey information. It was originally developed to support the establishment of national survey data archives.

  • Updated Mar 30, 2022
  • PHP
aaronspring
aaronspring commented Jul 30, 2020
help wanted good first issue Dataset request
cmgrote
cmgrote commented Aug 24, 2021

There's a reasonable chunk of work to do regarding migrating the Types documentation. The tasks would involve:

  • Saving the individual diagrams in each page as a separate SVG file (per documentation guide's instructions)
  • Updating the various pages to point to these SVG files instead of the PNG files
  • Ensuring that every type (entity, relationship or classification) being expla
good first issue help wanted

Improve this page

Add a description, image, and links to the data-catalog topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-catalog topic, visit your repo's landing page and select "manage topics."

Learn more