Skip to content

Conversation

tpaulshippy
Copy link
Contributor

@tpaulshippy tpaulshippy commented Aug 6, 2025

What this does

As explained here there are numerous reasons to use the newer Responses API instead of the Chat Completions API. Features we get by switching include:

  1. Mixing web search with custom tools
  2. Conversations with images in and out
  3. Certain models like o4-mini-deep-research
  4. MCP support through OpenAI

There is one feature not yet available - audio inputs are not supported by the Responses API. So, the library will detect any audio inputs and fall back to the Chat Completions API if they exist.

Type of change

  • New feature

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • No API changes

Related issues

Replaces #290
Should enable resolution of #213

…e had

Getting this:
ruby(19317,0x206ace0c0) malloc: Double free of object 0x10afc39e0
ruby(19317,0x206ace0c0) malloc: *** set a breakpoint in malloc_error_break to debug
@crmne
Copy link
Owner

crmne commented Aug 26, 2025

Hey Paul,

thanks for your work so far! Just to be totally transparent:

This is a big change. I feel like this needs a bit more time in the oven and I want to think thoroughly how to have one provider that has two different APIs as RubyLLM was not designed with that in mind.

Responses API not supporting audio is a big no-no. Backwards compatibility is king. Not using a new shiny model that's only supported on the responses API is a lot less inconvenient than suddenly pulling the rug under audio support.

I'll take a look at this when it hurts.

@tpaulshippy
Copy link
Contributor Author

This change would be backward compatible as it falls back to the chat completion API when you provide audio.

@jaryl
Copy link

jaryl commented Sep 24, 2025

Just some heads up, support for audio in the Responses API might be coming:

Multimodal from the ground up. Text, images, audio, function calls—all first-class citizens. We didn’t bolt modalities onto a text API; we designed the house with enough bedrooms from day one.
Source: https://developers.openai.com/blog/responses-api/

Don't see it in the docs yet.

@rikkiprince
Copy link

I'm interested in Responses API support so we can more easily connect to remote MCP servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants