Skip to content

feat: Add profiling initialization code to training_utils#732

Closed
mkovalski wants to merge 41 commits into
googleapis:mainfrom
mkovalski:profiling_sdk
Closed

feat: Add profiling initialization code to training_utils#732
mkovalski wants to merge 41 commits into
googleapis:mainfrom
mkovalski:profiling_sdk

Conversation

@mkovalski
Copy link
Copy Markdown
Contributor

Add ability to profile Vertex Training jobs on demand.

  • Merged in training_utils and tests from dev branch, consists of environment variables to use during training
  • Add base web server to run with user's job
  • Add tensorflow profiler plugin to be registered with web server to allow for remote profiling through Vertex TensorBoard

This should be merged after #704 as it contains this PR but adding here for clarity.

Fixes #519

mkovalski and others added 30 commits August 23, 2021 15:10
@mkovalski mkovalski requested a review from a team September 29, 2021 19:46
@product-auto-label product-auto-label Bot added the api: aiplatform Issues related to the AI Platform API. label Sep 29, 2021
@google-cla google-cla Bot added the cla: yes This human has signed the Contributor License Agreement. label Sep 29, 2021
@mkovalski mkovalski closed this Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: aiplatform Issues related to the AI Platform API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add remote tensorflow profiling to training jobs.

2 participants