Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add recipe dependencies #74

Open
semio opened this issue Aug 16, 2017 · 2 comments
Open

add recipe dependencies #74

semio opened this issue Aug 16, 2017 · 2 comments

Comments

@semio
Copy link
Owner

semio commented Aug 16, 2017

Jasper Heeffer [7:22 AM]
there is definitely work to do though, to automize sous-chefs and recipes and recipe dependencies so we can have a pipeline from source to final dataset, like we saw in datapackage-pipelines at the conference

Jasper Heeffer [7:39 AM]
on a higher level, traceability is something that is important to us but not yet done (edited)

[7:41]
so that we can see the road of a datapoint from source to e.g. systema globalis

@semio
Copy link
Owner Author

semio commented Aug 21, 2017

because we have ingredients section in recipe, which is actually a dependency list, we don't need to add a new section for dependencies. I suggest we add options to ingredient definitions:

- id: datapoint-ingredient
  dataset: source_dataset
  key: geo, time
  options:
      update_source_dataset: true

so when update_source_dataset is true, the source dataset will be updated before it's loaded into chef. This should work for source datasets with recipe or etl script. If there are multiple ingredients from same dataset, this option seems redundant, but if we finally use the dataset as ingredient approach, this problem will be solved.

on the traceability, we can partly achieve by using the to_graph() function demonstrate here https://github.com/semio/ddf_utils/blob/dev/notebook/Chef%20API.ipynb. But if we want to see the road of a datapoint from source to output, I think we need to build a inspector which can stop the chef on each step and see the data inside.

An other thing we might do is to add a web page for the recipe, showing the recipe and its dependency recipes, and a button for each recipe to run the recipe

@semio
Copy link
Owner Author

semio commented Sep 4, 2017

also, we can add etl scripts as dependencies, so before the recipe, chef will run the etl scripts first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant