prisma-plasmate

Prisma integration for Plasmate - the browser engine for AI agents.

Store and query web content with 10-100x token compression using Plasmate's Semantic Object Model (SOM).

Features

Token Compression: Store web content as SOM JSON with 10-100x fewer tokens than raw HTML
Type-Safe Queries: Full TypeScript support with Prisma's type safety
Batch Processing: Efficiently fetch and store multiple URLs with concurrency control
Full-Text Search: Query stored content with text search
Crawl Sessions: Group related fetches for organized data management
Link Extraction: Automatically extract and store page relationships
Caching: Skip refetching recently stored content

Installation

npm install prisma-plasmate @prisma/client
npm install -D prisma

You also need Plasmate installed:

cargo install plasmate
# or
brew install plasmate

Quick Start

1. Add Schema Models

Add the Plasmate models to your prisma/schema.prisma:

model WebContent {
  id             String   @id @default(cuid())
  url            String
  canonicalUrl   String?
  title          String?
  description    String?
  som            Json
  textContent    String?
  htmlTokens     Int?
  somTokens      Int?
  compressionRatio Float?
  statusCode     Int?
  contentType    String?
  headers        Json?
  fetchedAt      DateTime @default(now())
  updatedAt      DateTime @updatedAt

  crawlSession   CrawlSession? @relation(fields: [crawlSessionId], references: [id])
  crawlSessionId String?
  outboundLinks  Link[] @relation("SourceLinks")
  inboundLinks   Link[] @relation("TargetLinks")

  @@unique([url, crawlSessionId])
  @@index([url])
  @@index([fetchedAt])
  @@index([crawlSessionId])
}

model CrawlSession {
  id          String      @id @default(cuid())
  name        String?
  startedAt   DateTime    @default(now())
  completedAt DateTime?
  status      CrawlStatus @default(RUNNING)
  metadata    Json?
  contents    WebContent[]

  @@index([status])
  @@index([startedAt])
}

model Link {
  id       String      @id @default(cuid())
  href     String
  text     String?
  rel      String?
  source   WebContent  @relation("SourceLinks", fields: [sourceId], references: [id], onDelete: Cascade)
  sourceId String
  target   WebContent? @relation("TargetLinks", fields: [targetId], references: [id], onDelete: SetNull)
  targetId String?

  @@index([sourceId])
  @@index([targetId])
  @@index([href])
}

enum CrawlStatus {
  RUNNING
  COMPLETED
  FAILED
  CANCELLED
}

2. Run Migrations

npx prisma migrate dev --name add-web-content

3. Fetch and Store Content

import { createPlasmaPrismaClient } from 'prisma-plasmate';

const client = createPlasmaPrismaClient();

// Fetch a URL and store it
const result = await client.fetchAndStore('https://example.com');
console.log(`Stored: ${result.title}`);
console.log(`SOM tokens: ${result.somTokens}`);

// Search stored content
const results = await client.search('typescript');
for (const item of results) {
  console.log(`${item.title}: ${item.url}`);
}

await client.disconnect();

Usage

PlasmaPrismaClient

The main client class provides high-level methods for web content operations:

import { createPlasmaPrismaClient } from 'prisma-plasmate';

const client = createPlasmaPrismaClient({
  plasmate: {
    binaryPath: 'plasmate',  // Path to plasmate CLI
    timeout: 30000,          // Request timeout
    defaultHeaders: {        // Headers for all requests
      'User-Agent': 'MyBot/1.0',
    },
  },
});

// Fetch single URL
const result = await client.fetchAndStore('https://docs.example.com', {
  headers: { 'Authorization': 'Bearer token' },
  cacheFor: 60 * 60 * 1000, // Don't refetch within 1 hour
});

// Batch fetch with progress
const batchResult = await client.batchFetchAndStore(urls, {
  concurrency: 5,
  continueOnError: true,
  onProgress: (done, total, url) => {
    console.log(`[${done}/${total}] ${url}`);
  },
});

// Search content
const results = await client.search('react hooks', {
  limit: 20,
  urlPattern: 'reactjs.org',
});

// Get statistics
const stats = await client.getStats();
console.log(`Token savings: ${stats.tokensSaved}`);

Prisma Extension

For native Prisma integration, use the extension API:

import { PrismaClient } from '@prisma/client';
import { plasmateExtension } from 'prisma-plasmate';

const prisma = new PrismaClient().$extends(plasmateExtension());

// Fetch and store
const result = await prisma.$plasmate.fetch('https://example.com');

// Search
const results = await prisma.$plasmate.search('query');

// Get SOM directly
const som = await prisma.$plasmate.getSom('https://example.com');

// Statistics
const stats = await prisma.$plasmate.getStats();

Crawl Sessions

Group related fetches together:

const client = createPlasmaPrismaClient();

// Create session
const session = await client.createSession('docs-crawl', {
  source: 'documentation',
  version: '2.0',
});

// Fetch with session
await client.batchFetchAndStore(urls, {
  crawlSessionId: session.id,
});

// Query session content
const results = await client.search('api', {
  crawlSessionId: session.id,
});

// Complete session
await client.completeSession(session.id, 'COMPLETED');

Direct Prisma Queries

Access the underlying Prisma client for custom queries:

const client = createPlasmaPrismaClient();

// Get content with high compression
const efficient = await client.db.webContent.findMany({
  where: {
    compressionRatio: { gte: 20 },
  },
  orderBy: { compressionRatio: 'desc' },
  take: 10,
});

// Find pages with specific links
const pages = await client.db.webContent.findMany({
  where: {
    outboundLinks: {
      some: {
        href: { contains: 'github.com' },
      },
    },
  },
  include: {
    outboundLinks: true,
  },
});

Schema Helpers

Generate schema programmatically:

import { generateSchema, PostgresFullTextIndex } from 'prisma-plasmate';

// Generate complete schema
const schema = generateSchema({
  provider: 'postgresql',
  includeLinks: true,
  includeSessions: true,
});

// Get PostgreSQL full-text search SQL
console.log(PostgresFullTextIndex);

Type Safety

All operations are fully typed:

import type {
  SOMResponse,
  FetchResult,
  SearchResult,
  ContentStats,
} from 'prisma-plasmate';

async function processContent(result: FetchResult) {
  console.log(result.somTokens); // number
  console.log(result.title);     // string | undefined
}

Token Compression

Plasmate converts HTML to a Semantic Object Model (SOM), reducing token usage by 10-100x:

const result = await client.fetchAndStore('https://docs.example.com/api');

console.log(`HTML tokens: ${result.htmlTokens}`);     // ~50,000
console.log(`SOM tokens: ${result.somTokens}`);       // ~2,500
console.log(`Compression: ${result.compressionRatio}x`); // 20x

This makes it practical to store and query web content for AI applications without exceeding context limits.

Full-Text Search

PostgreSQL

Enable PostgreSQL full-text search:

-- Add GIN index
CREATE INDEX web_content_text_search_idx
ON "WebContent"
USING GIN (to_tsvector('english', coalesce("textContent", '')));

SQLite

For SQLite, use FTS5:

import { SqliteFullTextIndex } from 'prisma-plasmate';

// Run the SQL to set up FTS
await prisma.$executeRawUnsafe(SqliteFullTextIndex);

API Reference

PlasmaPrismaClient

Method	Description
`fetchAndStore(url, options?)`	Fetch URL and store SOM
`batchFetchAndStore(urls, options?)`	Batch fetch with concurrency
`search(query, options?)`	Search stored content
`getByUrl(url, sessionId?)`	Get content by URL
`getSom(url, sessionId?)`	Get raw SOM for URL
`createSession(name?, metadata?)`	Create crawl session
`completeSession(id, status?)`	Mark session complete
`getStats(sessionId?)`	Get token statistics
`pruneOldContent(olderThan)`	Delete old content
`disconnect()`	Close database connection

Prisma Extension ($plasmate)

Method	Description
`fetch(url, options?)`	Fetch and store URL
`search(query, options?)`	Search stored content
`getSom(url)`	Get SOM for URL
`getStats()`	Get statistics
`delete(url)`	Delete content by URL

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
prisma		prisma
src		src
.gitignore		.gitignore
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prisma-plasmate

Features

Installation

Quick Start

1. Add Schema Models

2. Run Migrations

3. Fetch and Store Content

Usage

PlasmaPrismaClient

Prisma Extension

Crawl Sessions

Direct Prisma Queries

Schema Helpers

Type Safety

Token Compression

Full-Text Search

PostgreSQL

SQLite

API Reference

PlasmaPrismaClient

Prisma Extension ($plasmate)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

prisma-plasmate

Features

Installation

Quick Start

1. Add Schema Models

2. Run Migrations

3. Fetch and Store Content

Usage

PlasmaPrismaClient

Prisma Extension

Crawl Sessions

Direct Prisma Queries

Schema Helpers

Type Safety

Token Compression

Full-Text Search

PostgreSQL

SQLite

API Reference

PlasmaPrismaClient

Prisma Extension ($plasmate)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages