Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new versions of pretraining and fine-tuning #1009

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions tutorials-and-examples/nvidia-bionemo/README.md
Original file line number Diff line number Diff line change
@@ -2,8 +2,8 @@

#### Pretraining ESM-2

- [Pretraining ESM-2 LLM on GKE using BioNeMo Framework 2.0](./esm2/README.md#pretraining)
- [Pretraining ESM-2 LLM on GKE using BioNeMo Framework 2.0](./pretraining/README.md)

#### Fine-turning ESM-2
#### Fine-tuning ESM-2

- [Fine-tuning ESM-2 LLM on GKE using BioNeMo Framework 2.0](./esm2/README.md#fine-tuning)
- [Fine-tuning ESM-2 LLM on GKE using BioNeMo Framework 2.0](./fine-tuning/README.md)
9 changes: 9 additions & 0 deletions tutorials-and-examples/nvidia-bionemo/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- storage/storage-class.yaml
- storage/pvcs.yaml
- monitoring/tensorboard-deployment.yaml
- monitoring/tensorboard-service.yaml
- monitoring/rbac.yaml
29 changes: 29 additions & 0 deletions tutorials-and-examples/nvidia-bionemo/base/monitoring/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: tensorboard-sa
namespace: bionemo-training
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: job-reader
namespace: bionemo-training
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: tensorboard-job-reader
namespace: bionemo-training
subjects:
- kind: ServiceAccount
name: tensorboard-sa
namespace: bionemo-training
roleRef:
kind: Role
name: job-reader
apiGroup: rbac.authorization.k8s.io
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: tensorboard
namespace: bionemo-training
spec:
replicas: 1
selector:
matchLabels:
app: tensorboard
template:
metadata:
labels:
app: tensorboard
spec:
containers:
- name: tensorboard
image: tensorflow/tensorflow:latest
command:
- tensorboard
args:
- --logdir=/workspace/bionemo2/results/lightning_logs
- --port=6006
volumeMounts:
- name: bionemo-storage
mountPath: /workspace/bionemo2/results
subPath: tensorboard-logs
readOnly: true
volumes:
- name: bionemo-storage
persistentVolumeClaim:
claimName: bionemo-filestore
serviceAccountName: tensorboard-sa
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: tensorboard-service
namespace: bionemo-training
spec:
selector:
app: tensorboard
ports:
- port: 6006
targetPort: 6006
type: ClusterIP
4 changes: 4 additions & 0 deletions tutorials-and-examples/nvidia-bionemo/base/namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: bionemo-training
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- storage-class.yaml
- pvcs.yaml
12 changes: 12 additions & 0 deletions tutorials-and-examples/nvidia-bionemo/base/storage/pvcs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: bionemo-filestore
namespace: bionemo-training
spec:
accessModes:
- ReadWriteMany
storageClassName: filestore-storage
resources:
requests:
storage: 1Ti
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: filestore-storage
provisioner: filestore.csi.storage.gke.io
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
tier: BASIC_HDD
network: default
221 changes: 0 additions & 221 deletions tutorials-and-examples/nvidia-bionemo/esm2/README.md

This file was deleted.

23 changes: 0 additions & 23 deletions tutorials-and-examples/nvidia-bionemo/esm2/create-mount-fs.yaml

This file was deleted.

Loading