data-catalog

Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.

There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t

We have recently made dataset versions traversable via our dataset tab on our lineage page. We would like to do the same for job versions as well. We will want to be able to start with a job, navigate across versions, then navigate again across the runs for that job version. We would also like to see detailed information about job versions on this intermediate page as well. One prereq for this is

Follow the implementation example of ingestion/tests/integration/ometa/test_ometa_database_service_api.py to implement the testing of the Python client for PipelineService.

@cantzakas

@cantzakas created the SQL query necessary to pull metadata in (hyperqueryhq/whale#140) -- we just have to make the greenplum extractor scaffolding. This should just follow the exact same shape as the Postgres extractor.

Describe the bug
Currently, we have places in project where paged responses are returned with hardcoded hasNext and total properties. Need to fix this and return valid values to frontend. Frontend need to use these values instead of current solution

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

Add more logging in all modules to emit debug signals for improved logging.

Describe the bug
There were some build tags integration for each plugin extractor tests but we renamed some of the tags to plugins. In the workflow, we only run plugin_test in the main branch. Even with this, not all of them are tagged with plugins, this caused some plugins are not being covered to test in the plugin_test workflow.
Some plugins have broken test

Superset: port he

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

------------------------

pattern= catalog : dataset name : url : comment
ocean: World Ocean Atlas: https://www.nodc.noaa.gov/OC5/woa18/ : different versions and variables via parameter #15
global carbon budget with https://github.com/edjdavid/intake-excel #22
land: precipitation: https://psl.noaa.gov/data/gridded/tables/precipitation.html:
Mauna Loa CO2 netcdf ftp://aftp.cmdl.noaa.go

It would be nice to have a debug message for:

lookup request with parameters
lookup result(s)

There's a reasonable chunk of work to do regarding migrating the Types documentation. The tasks would involve:

Saving the individual diagrams in each page as a separate SVG file (per documentation guide's instructions)
Updating the various pages to point to these SVG files instead of the PNG files
Ensuring that every type (entity, relationship or classification) being expla

data-catalog

Here are 74 public repositories matching this topic...

datahub-project / datahub

amundsen-io / amundsen

MarquezProject / marquez

open-metadata / OpenMetadata

intake / intake

hyperqueryhq / whale

opendatadiscovery / odd-platform

tokern / piicatcher

GoogleCloudPlatform / bigquery-data-lineage

odpf / meteor

intake / intake-esm

aws-samples / aws-dbs-refarch-datalake

opendatadiscovery / awesome-data-catalogs

getmetamapper / metamapper

GoogleCloudPlatform / datacatalog-connectors-rdbms

Bayer-Group / COLID-Documentation

ihsn / nada

GoogleCloudPlatform / datacatalog-connectors-bi

datopian / portal.js.bak

aaronspring / remote_climate_data

related-sciences / articat

FINRAOS / herd-mdl

SciCatProject / frontend

GoogleCloudPlatform / datacatalog-tag-history

Bayer-Group / COLID-Setup

GoogleCloudPlatform / datacatalog-tag-engine

odpi / egeria-docs

NCAR / esm-collection-spec

slaclab / datacat

Bayer-Group / COLID-Data-Marketplace-Frontend

Improve this page

Add this topic to your repo