Prisma integration for Plasmate - the browser engine for AI agents.
Store and query web content with 10-100x token compression using Plasmate's Semantic Object Model (SOM).
- Token Compression: Store web content as SOM JSON with 10-100x fewer tokens than raw HTML
- Type-Safe Queries: Full TypeScript support with Prisma's type safety
- Batch Processing: Efficiently fetch and store multiple URLs with concurrency control
- Full-Text Search: Query stored content with text search
- Crawl Sessions: Group related fetches for organized data management
- Link Extraction: Automatically extract and store page relationships
- Caching: Skip refetching recently stored content
npm install prisma-plasmate @prisma/client
npm install -D prismaYou also need Plasmate installed:
cargo install plasmate
# or
brew install plasmateAdd the Plasmate models to your prisma/schema.prisma:
model WebContent {
id String @id @default(cuid())
url String
canonicalUrl String?
title String?
description String?
som Json
textContent String?
htmlTokens Int?
somTokens Int?
compressionRatio Float?
statusCode Int?
contentType String?
headers Json?
fetchedAt DateTime @default(now())
updatedAt DateTime @updatedAt
crawlSession CrawlSession? @relation(fields: [crawlSessionId], references: [id])
crawlSessionId String?
outboundLinks Link[] @relation("SourceLinks")
inboundLinks Link[] @relation("TargetLinks")
@@unique([url, crawlSessionId])
@@index([url])
@@index([fetchedAt])
@@index([crawlSessionId])
}
model CrawlSession {
id String @id @default(cuid())
name String?
startedAt DateTime @default(now())
completedAt DateTime?
status CrawlStatus @default(RUNNING)
metadata Json?
contents WebContent[]
@@index([status])
@@index([startedAt])
}
model Link {
id String @id @default(cuid())
href String
text String?
rel String?
source WebContent @relation("SourceLinks", fields: [sourceId], references: [id], onDelete: Cascade)
sourceId String
target WebContent? @relation("TargetLinks", fields: [targetId], references: [id], onDelete: SetNull)
targetId String?
@@index([sourceId])
@@index([targetId])
@@index([href])
}
enum CrawlStatus {
RUNNING
COMPLETED
FAILED
CANCELLED
}npx prisma migrate dev --name add-web-contentimport { createPlasmaPrismaClient } from 'prisma-plasmate';
const client = createPlasmaPrismaClient();
// Fetch a URL and store it
const result = await client.fetchAndStore('https://example.com');
console.log(`Stored: ${result.title}`);
console.log(`SOM tokens: ${result.somTokens}`);
// Search stored content
const results = await client.search('typescript');
for (const item of results) {
console.log(`${item.title}: ${item.url}`);
}
await client.disconnect();The main client class provides high-level methods for web content operations:
import { createPlasmaPrismaClient } from 'prisma-plasmate';
const client = createPlasmaPrismaClient({
plasmate: {
binaryPath: 'plasmate', // Path to plasmate CLI
timeout: 30000, // Request timeout
defaultHeaders: { // Headers for all requests
'User-Agent': 'MyBot/1.0',
},
},
});
// Fetch single URL
const result = await client.fetchAndStore('https://docs.example.com', {
headers: { 'Authorization': 'Bearer token' },
cacheFor: 60 * 60 * 1000, // Don't refetch within 1 hour
});
// Batch fetch with progress
const batchResult = await client.batchFetchAndStore(urls, {
concurrency: 5,
continueOnError: true,
onProgress: (done, total, url) => {
console.log(`[${done}/${total}] ${url}`);
},
});
// Search content
const results = await client.search('react hooks', {
limit: 20,
urlPattern: 'reactjs.org',
});
// Get statistics
const stats = await client.getStats();
console.log(`Token savings: ${stats.tokensSaved}`);For native Prisma integration, use the extension API:
import { PrismaClient } from '@prisma/client';
import { plasmateExtension } from 'prisma-plasmate';
const prisma = new PrismaClient().$extends(plasmateExtension());
// Fetch and store
const result = await prisma.$plasmate.fetch('https://example.com');
// Search
const results = await prisma.$plasmate.search('query');
// Get SOM directly
const som = await prisma.$plasmate.getSom('https://example.com');
// Statistics
const stats = await prisma.$plasmate.getStats();Group related fetches together:
const client = createPlasmaPrismaClient();
// Create session
const session = await client.createSession('docs-crawl', {
source: 'documentation',
version: '2.0',
});
// Fetch with session
await client.batchFetchAndStore(urls, {
crawlSessionId: session.id,
});
// Query session content
const results = await client.search('api', {
crawlSessionId: session.id,
});
// Complete session
await client.completeSession(session.id, 'COMPLETED');Access the underlying Prisma client for custom queries:
const client = createPlasmaPrismaClient();
// Get content with high compression
const efficient = await client.db.webContent.findMany({
where: {
compressionRatio: { gte: 20 },
},
orderBy: { compressionRatio: 'desc' },
take: 10,
});
// Find pages with specific links
const pages = await client.db.webContent.findMany({
where: {
outboundLinks: {
some: {
href: { contains: 'github.com' },
},
},
},
include: {
outboundLinks: true,
},
});Generate schema programmatically:
import { generateSchema, PostgresFullTextIndex } from 'prisma-plasmate';
// Generate complete schema
const schema = generateSchema({
provider: 'postgresql',
includeLinks: true,
includeSessions: true,
});
// Get PostgreSQL full-text search SQL
console.log(PostgresFullTextIndex);All operations are fully typed:
import type {
SOMResponse,
FetchResult,
SearchResult,
ContentStats,
} from 'prisma-plasmate';
async function processContent(result: FetchResult) {
console.log(result.somTokens); // number
console.log(result.title); // string | undefined
}Plasmate converts HTML to a Semantic Object Model (SOM), reducing token usage by 10-100x:
const result = await client.fetchAndStore('https://docs.example.com/api');
console.log(`HTML tokens: ${result.htmlTokens}`); // ~50,000
console.log(`SOM tokens: ${result.somTokens}`); // ~2,500
console.log(`Compression: ${result.compressionRatio}x`); // 20xThis makes it practical to store and query web content for AI applications without exceeding context limits.
Enable PostgreSQL full-text search:
-- Add GIN index
CREATE INDEX web_content_text_search_idx
ON "WebContent"
USING GIN (to_tsvector('english', coalesce("textContent", '')));For SQLite, use FTS5:
import { SqliteFullTextIndex } from 'prisma-plasmate';
// Run the SQL to set up FTS
await prisma.$executeRawUnsafe(SqliteFullTextIndex);| Method | Description |
|---|---|
fetchAndStore(url, options?) |
Fetch URL and store SOM |
batchFetchAndStore(urls, options?) |
Batch fetch with concurrency |
search(query, options?) |
Search stored content |
getByUrl(url, sessionId?) |
Get content by URL |
getSom(url, sessionId?) |
Get raw SOM for URL |
createSession(name?, metadata?) |
Create crawl session |
completeSession(id, status?) |
Mark session complete |
getStats(sessionId?) |
Get token statistics |
pruneOldContent(olderThan) |
Delete old content |
disconnect() |
Close database connection |
| Method | Description |
|---|---|
fetch(url, options?) |
Fetch and store URL |
search(query, options?) |
Search stored content |
getSom(url) |
Get SOM for URL |
getStats() |
Get statistics |
delete(url) |
Delete content by URL |
MIT