data-engineering
Here are 1,208 public repositories matching this topic...
-
Updated
Mar 26, 2022
-
Updated
Jan 2, 2022
-
Updated
Jan 25, 2022
Opened from the Prefect Public Slack Community
pat: This is a pretty minor problem as these things go, but it would be great if there was a way to disable the ASCII logo in Prefect Agent and Prefect Server, since it pollutes our server logs in DataDog. I can go hack the code, in Prefect, but it seems inelegant to have to re-apply such code after every version up
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
Tell us about the problem you're trying to solve
When building a connector, one of the most important requirements is to define the output schema of each stream in the connector. There are multiple ways of doing this:
- Transcribing the JSONSchema by hand based on reading API docs
- Using the API's OpenAPI spec to discern the schema. [Docs](https://docs.airbyte.com/connector-development/c
Under the hood, Benthos csv input uses the standard encoding/csv packages's csv.Reader struct.
The current implementation of csv input doesn't allow setting the LazyQuotes field.
We have a use case where we need to set the LazyQuotes field in order to make things work correctly.
Is your feature request related to a problem? Please describe.
Currently in feature_store.yaml, we can only specify a region for DynamoDB provider. As a result, it requires an actual DynamoDB to be available when we want to do local development/testing or integration testing in a sandbox environment.
Describe the solution you'd like
A way to solve this is to let user pass an endpoint
When we show data for a metric, we currently don't include the current day's worth of data. For users just getting set up, they may only have events from today, and want to test out if the query is working, and by excluding events from 'today', they can't see results.
TODO:
- In
packages/back-end/src/services/experiments.tson line329, instead of using the current date as the value
-
Updated
Apr 6, 2022 - Python
On more advanced versions of LakeFS (probably > = v1.0.0), we would like to remove the logic that tries to fill the generation field in DB when loading old dumps. It means we will no longer support loading dump that made with a version lower than v0.61.0.
This is related to #647 - we already allow custom product paths, but it is not possible to customize product paths
-
Updated
Feb 2, 2022
-
Updated
Apr 6, 2022 - Java
-
Updated
Mar 29, 2022 - JavaScript
-
Updated
Apr 5, 2022 - Scala
-
Updated
Apr 6, 2022 - Jupyter Notebook
-
Updated
Dec 31, 2021
(1) Add docstrings to methods
(2) Covert .format() methods to f strings for readability
(3) Make sure we are using Python 3.8 throughout
(4) zip extract_all() in ingest_flights.py can be simplified with a Path parameter
-
Updated
Mar 9, 2020 - Python
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
Background
This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.
Criteria reiterated here for the benefit of discussion:
It sh
Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.
Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:
import requests
import json
# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCo-
Updated
Mar 25, 2022 - Dockerfile
-
Updated
Apr 6, 2022 - Python
-
Updated
Mar 5, 2020 - Python
-
Updated
Jun 2, 2021
-
Updated
Mar 22, 2022
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.
How to reproduce the bug