AI agent that can SEE ποΈ, control, navigate, & do stuff for you on your browser.
-
Updated
Mar 1, 2026 - JavaScript
AI agent that can SEE ποΈ, control, navigate, & do stuff for you on your browser.
This sample project integrates OpenAI's GPT-4 Vision, with advanced image recognition capabilities, and DALLΒ·E 3, the state-of-the-art image generation model, with the Chat completions API. This powerful combination allows for simultaneous image creation and analysis.
A web-based tool that utilizes GPT-4's vision capabilities to analyze and describe system architecture diagrams, providing instant insights and detailed breakdowns in an interactive chat interface.
A customizable GPT in a single page, using OpenAI models text-embedding-ada-002, tts-1, whisper-1, dall-e-3, and gpt-4-vision-preview
AI apps using OpenAI Vision model.
Browser extension that generates image alternate text, using GPT-4o or an LM Studio server.
πΌοΈβ―οΈ Image generation worker
πΆοΈ web app that roasts website using gpt-4-vision
πΌοΈβ―οΈ Public endpoint worker
Meet Picursion π¨: A fun, recursive AI art generator! Using DALLΒ·E 3 and GPT-4 Vision, it creates an image, describes it, then uses the description as the next promptβagain and again. Watch as visuals evolve endlessly, blending creativity and AI magic. Loop on! π
Add a description, image, and links to the gpt-4-vision topic page so that developers can more easily learn about it.
To associate your repository with the gpt-4-vision topic, visit your repo's landing page and select "manage topics."