Gemini Live API可與 Gemini 進行低延遲的雙向文字和語音互動。使用 Live API,您可以為使用者提供自然流暢的語音對話體驗,並透過文字或語音指令中斷模型的回覆。模型可處理文字和音訊輸入內容 (影片功能即將推出!),並提供文字和音訊輸出內容。
您可以在 Google AI Studio 或 Vertex AI Studio 中,使用提示和 Live API 設計原型。
Live API 是有狀態的 API,可建立 WebSocket 連線,在用戶端和 Gemini 伺服器之間建立工作階段。詳情請參閱 Live API 參考文件 (Gemini Developer API | Vertex AI Gemini API)。
事前準備
| 按一下 Gemini API 供應商,即可在這個頁面查看供應商專屬內容和程式碼。 | 
如果尚未完成,請參閱入門指南,瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、為所選Gemini API供應商初始化後端服務,以及建立 LiveModel 執行個體。
支援這項功能的模型
支援 Live API 的模型取決於您選擇的Gemini API供應商。
- Gemini Developer API - gemini-live-2.5-flash(私人 GA*)
- gemini-live-2.5-flash-preview
- gemini-2.0-flash-live-001
- gemini-2.0-flash-live-preview-04-09
 
- Vertex AI Gemini API - gemini-live-2.5-flash(私人 GA*)
- gemini-2.0-flash-live-preview-04-09(僅限在- us-central1中存取)
 
請注意,在 2.5 版模型名稱中,Live API 區段後方緊接著 live 區段。gemini
* 請與Google Cloud帳戶團隊代表聯絡,要求存取權。
使用 Live API 的標準功能
本節說明如何使用 Live API 的標準功能,特別是串流各種輸入和輸出內容:
根據串流文字輸入內容生成串流文字
| 試用這個範例前,請先完成本指南的「事前準備」一節,設定專案和應用程式。 在該節中,您也會點選所選Gemini API供應商的按鈕,以便在本頁面查看供應商專屬內容。 | 
您可以傳送串流文字輸入內容,並接收串流文字輸出內容。請務必建立 liveModel 執行個體,並將回應模式設為 Text。
Swift
import FirebaseAI
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: LiveGenerationConfig(
    responseModalities: [.text]
  )
)
do {
  let session = try await model.connect()
  // Provide a text prompt
  let text = "tell a short story"
  await session.sendTextRealtime(text)
  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? TextPart {
          outputText += part.text
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
  // Output received from the server.
  print(outputText)
} catch {
  fatalError(error.localizedDescription)
}
Kotlin
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT 
   }
)
val session = model.connect()
// Provide a text prompt
val text = "tell a short story"
session.send(text)
var outputText = ""
session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}
// Output received from the server.
println(outputText)
Java
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
       // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	  LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);
Web
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});
const session = await model.connect();
// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);
// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
late LiveModelSession _session;
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.text]),
);
_session = await model.connect();
// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);
// In a separate thread, receive the response
await for (final message in _session.receive()) {
   // Process the received message
}
Unity
using Firebase;
using Firebase.AI;
async Task SendTextReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );
  LiveSession session = await model.ConnectAsync();
  // Provide a text prompt
  var prompt = ModelContent.Text("tell a short story");
  await session.SendAsync(content: prompt, turnComplete: true);
  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }
}
從串流音訊輸入生成串流音訊
| 試用這個範例前,請先完成本指南的「事前準備」一節,設定專案和應用程式。 在該節中,您也會點選所選Gemini API供應商的按鈕,以便在本頁面查看供應商專屬內容。 | 
你可以傳送串流音訊輸入內容,並接收串流音訊輸出內容。請務必建立 LiveModel 例項,並將回應模式設為 Audio。
瞭解如何設定及自訂回覆語音 (本頁後續內容)。
Swift
import FirebaseAI
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)
do {
  let session = try await model.connect()
  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }
  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)
  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}
Kotlin
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO 
   }
)
val session = model.connect()
// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()
Java
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);
Web
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});
const session = await model.connect();
// Start the audio conversation
const audioConversationController = await startAudioConversation(session);
// ... Later, to stop the audio conversation
// await audioConversationController.stop()
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.audio]),
);
_session = await model.connect();
final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);
// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message
}
Unity
using Firebase;
using Firebase.AI;
async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );
  LiveSession session = await model.ConnectAsync();
  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));
  // Start receiving the response
  await ReceiveAudio(session);
}
IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;
  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);
  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }
    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);
    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }
      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);
        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);
        lastSamplePosition = currentSamplePosition;
      }
    }
    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}
Queue audioBuffer = new();
async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;
  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);
  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();
  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}
// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;
  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }
  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}
  打造更引人入勝的互動體驗
本節說明如何建立及管理 Live API 中更具吸引力或互動性的功能。
變更回覆語音
Live API 使用 Chirp 3 支援合成語音回應。 使用 Firebase AI Logic 時,你可以用各種 HD 語音傳送音訊,如需完整清單和各語音的試聽範例,請參閱「Chirp 3:HD 語音」。
如要指定語音,請在 speechConfig 物件中設定語音名稱,做為模型設定的一部分。如未指定語音,預設為 Puck。
| 試用這個範例前,請先完成本指南的「事前準備」一節,設定專案和應用程式。 在該節中,您也會點選所選Gemini API供應商的按鈕,以便在本頁面查看供應商專屬內容。 | 
Swift
import FirebaseAI
// ...
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)
// ...
Kotlin
// ...
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)
// ...
Java
// ...
LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);
// ...
Web
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});
Dart
// ...
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: ResponseModalities.audio,
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);
// ...
Unity
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  liveGenerationConfig: new LiveGenerationConfig(
    responseModalities: new[] { ResponseModality.Audio },
    speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME"))
);
如要讓模型以非英文語言回覆,請在系統指令中加入下列內容,確保模型生成最佳結果:
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
在工作階段和要求之間維持脈絡
您可以透過聊天結構,在不同工作階段和要求之間維持背景資訊。 請注意,這項功能僅適用於文字輸入和文字輸出。
這個方法最適合簡短的脈絡;您可以傳送逐輪互動,代表事件的確切順序。如果對話內容較長,建議提供單一訊息摘要,以便在後續互動中釋出對話視窗。
處理中斷
Firebase AI Logic尚未支援處理中斷。 請過一陣子再回來查看!
使用函式呼叫 (工具)
您可以定義工具 (例如可用函式),並透過 Live API 使用,就像使用標準內容生成方法一樣。本節說明搭配函式呼叫使用 Live API 時的一些細微差異。如需函式呼叫的完整說明和範例,請參閱函式呼叫指南。
模型可以根據單一提示產生多個函式呼叫,以及串連輸出內容所需的程式碼。這段程式碼會在沙箱環境中執行,產生後續的BidiGenerateContentToolCall訊息。執行作業會暫停,直到每個函式呼叫的結果都可用為止,確保處理作業依序進行。
此外,搭配函式呼叫使用 Live API 特別強大,因為模型可以向使用者要求後續或釐清資訊。舉例來說,如果模型沒有足夠資訊,無法為要呼叫的函式提供參數值,模型可以要求使用者提供更多或更清楚的資訊。
用戶端應回覆 BidiGenerateContentToolResponse。
限制與需求
請注意 Live API 的下列限制和規定。
語音轉錄
Firebase AI Logic 尚未支援語音轉錄功能。請過一陣子再回來查看!
語言
- 輸入語言:查看Gemini 模型支援的輸入語言完整清單
- 輸出語言:如要查看支援的輸出語言完整清單,請參閱「Chirp 3:HD 語音」一文。
音訊格式
Live API 支援下列音訊格式:
- 輸入音訊格式:16 kHz 小端序的原始 16 位元 PCM 音訊
- 輸出音訊格式:24 kHz 小端序的原始 16 位元 PCM 音訊
頻率限制
Live API對每個 Firebase 專案的並行工作階段數,以及每分鐘權杖數 (TPM) 都有速率限制。
- Gemini Developer API: - 限制會因專案的「使用層級」而異 (請參閱速率限制說明文件)Gemini Developer API
 
- Vertex AI Gemini API: - 每個 Firebase 專案 5,000 個並行工作階段
- 每分鐘 400 萬個權杖
 
工作階段時間長度
工作階段的預設長度為 10 分鐘。如果工作階段時間超過上限,連線就會終止。
模型也會受到脈絡大小限制。傳送大量輸入內容可能會導致工作階段提早終止。
語音活動偵測 (VAD)
模型會對連續音訊輸入串流自動執行語音活動偵測 (VAD)。VAD 預設為啟用。
計算符記
您無法搭配 Live API 使用 CountTokens API。
提供有關 Firebase AI Logic 的使用體驗意見回饋