Skip to content

Upstream changes for v0.7.0 release#134

Merged
shubhadeepd merged 1 commit intomainfrom
v0.7.0-draft
Jun 18, 2024
Merged

Upstream changes for v0.7.0 release#134
shubhadeepd merged 1 commit intomainfrom
v0.7.0-draft

Conversation

@shubhadeepd
Copy link
Copy Markdown
Collaborator

This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from Nvidia API Catalog as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using latest Nvidia NIM-LLM.

For detailed changes please refer to CHANGELOG.md file.

@shubhadeepd shubhadeepd self-assigned this Jun 14, 2024
@shubhadeepd shubhadeepd added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request dependencies Pull requests that update a dependency file labels Jun 14, 2024
nv-pranjald
nv-pranjald previously approved these changes Jun 14, 2024
Copy link
Copy Markdown
Collaborator Author

@shubhadeepd shubhadeepd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for approval from code owners.

Signed-off-by: Shubhadeep Das <shubhadeepd@nvidia.com>
Copy link
Copy Markdown
Contributor

@jliberma jliberma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Shubhadeep

@shubhadeepd shubhadeepd merged commit b43e8b0 into main Jun 18, 2024
@shubhadeepd shubhadeepd deleted the v0.7.0-draft branch June 18, 2024 15:48
anniesurla pushed a commit to anniesurla/GenerativeAIExamples that referenced this pull request Jun 5, 2025
Signed-off-by: Shubhadeep Das <shubhadeepd@nvidia.com>
buildvoc-agent pushed a commit to buildvoc/GenerativeAIExamples that referenced this pull request Mar 21, 2026
Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.

Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants