Pachyderm
Assuming you have a Pachyderm installation and pachctl list repo
is working for you, deploying a notebook as a pipeline to Pachyderm is as simple as:
Install SAME:
pip3 install --upgrade sameproject
Set up a same.yaml and requirements.txt in a folder alongside your .ipynb file:
same init
Test the suggested container image against the requirements.txt and your notebook's imports (optional, requires Docker):
same verify
Deploy the notebook as a pipeline to Pachyderm:
same run --target pachyderm --input-repo test
Update the input repo to refer to a repo that exists on your Pachyderm installation.
You can also specify --input-glob
to specify a glob pattern, or --input
to specify a raw input specification in JSON format, to specify more advanced input formats.
Your notebook should read data from /pfs
, and write any output data to /pfs/out
. You might want to use the Pachyderm JupyterLab Mount Extension to develop your notebook, then it will run the same way with the mount extension as when you run it in Pachyderm with SAME.