Skip to content

plasmate-labs/prisma-plasmate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prisma-plasmate

Prisma integration for Plasmate - the browser engine for AI agents.

Store and query web content with 10-100x token compression using Plasmate's Semantic Object Model (SOM).

Features

  • Token Compression: Store web content as SOM JSON with 10-100x fewer tokens than raw HTML
  • Type-Safe Queries: Full TypeScript support with Prisma's type safety
  • Batch Processing: Efficiently fetch and store multiple URLs with concurrency control
  • Full-Text Search: Query stored content with text search
  • Crawl Sessions: Group related fetches for organized data management
  • Link Extraction: Automatically extract and store page relationships
  • Caching: Skip refetching recently stored content

Installation

npm install prisma-plasmate @prisma/client
npm install -D prisma

You also need Plasmate installed:

cargo install plasmate
# or
brew install plasmate

Quick Start

1. Add Schema Models

Add the Plasmate models to your prisma/schema.prisma:

model WebContent {
  id             String   @id @default(cuid())
  url            String
  canonicalUrl   String?
  title          String?
  description    String?
  som            Json
  textContent    String?
  htmlTokens     Int?
  somTokens      Int?
  compressionRatio Float?
  statusCode     Int?
  contentType    String?
  headers        Json?
  fetchedAt      DateTime @default(now())
  updatedAt      DateTime @updatedAt

  crawlSession   CrawlSession? @relation(fields: [crawlSessionId], references: [id])
  crawlSessionId String?
  outboundLinks  Link[] @relation("SourceLinks")
  inboundLinks   Link[] @relation("TargetLinks")

  @@unique([url, crawlSessionId])
  @@index([url])
  @@index([fetchedAt])
  @@index([crawlSessionId])
}

model CrawlSession {
  id          String      @id @default(cuid())
  name        String?
  startedAt   DateTime    @default(now())
  completedAt DateTime?
  status      CrawlStatus @default(RUNNING)
  metadata    Json?
  contents    WebContent[]

  @@index([status])
  @@index([startedAt])
}

model Link {
  id       String      @id @default(cuid())
  href     String
  text     String?
  rel      String?
  source   WebContent  @relation("SourceLinks", fields: [sourceId], references: [id], onDelete: Cascade)
  sourceId String
  target   WebContent? @relation("TargetLinks", fields: [targetId], references: [id], onDelete: SetNull)
  targetId String?

  @@index([sourceId])
  @@index([targetId])
  @@index([href])
}

enum CrawlStatus {
  RUNNING
  COMPLETED
  FAILED
  CANCELLED
}

2. Run Migrations

npx prisma migrate dev --name add-web-content

3. Fetch and Store Content

import { createPlasmaPrismaClient } from 'prisma-plasmate';

const client = createPlasmaPrismaClient();

// Fetch a URL and store it
const result = await client.fetchAndStore('https://example.com');
console.log(`Stored: ${result.title}`);
console.log(`SOM tokens: ${result.somTokens}`);

// Search stored content
const results = await client.search('typescript');
for (const item of results) {
  console.log(`${item.title}: ${item.url}`);
}

await client.disconnect();

Usage

PlasmaPrismaClient

The main client class provides high-level methods for web content operations:

import { createPlasmaPrismaClient } from 'prisma-plasmate';

const client = createPlasmaPrismaClient({
  plasmate: {
    binaryPath: 'plasmate',  // Path to plasmate CLI
    timeout: 30000,          // Request timeout
    defaultHeaders: {        // Headers for all requests
      'User-Agent': 'MyBot/1.0',
    },
  },
});

// Fetch single URL
const result = await client.fetchAndStore('https://docs.example.com', {
  headers: { 'Authorization': 'Bearer token' },
  cacheFor: 60 * 60 * 1000, // Don't refetch within 1 hour
});

// Batch fetch with progress
const batchResult = await client.batchFetchAndStore(urls, {
  concurrency: 5,
  continueOnError: true,
  onProgress: (done, total, url) => {
    console.log(`[${done}/${total}] ${url}`);
  },
});

// Search content
const results = await client.search('react hooks', {
  limit: 20,
  urlPattern: 'reactjs.org',
});

// Get statistics
const stats = await client.getStats();
console.log(`Token savings: ${stats.tokensSaved}`);

Prisma Extension

For native Prisma integration, use the extension API:

import { PrismaClient } from '@prisma/client';
import { plasmateExtension } from 'prisma-plasmate';

const prisma = new PrismaClient().$extends(plasmateExtension());

// Fetch and store
const result = await prisma.$plasmate.fetch('https://example.com');

// Search
const results = await prisma.$plasmate.search('query');

// Get SOM directly
const som = await prisma.$plasmate.getSom('https://example.com');

// Statistics
const stats = await prisma.$plasmate.getStats();

Crawl Sessions

Group related fetches together:

const client = createPlasmaPrismaClient();

// Create session
const session = await client.createSession('docs-crawl', {
  source: 'documentation',
  version: '2.0',
});

// Fetch with session
await client.batchFetchAndStore(urls, {
  crawlSessionId: session.id,
});

// Query session content
const results = await client.search('api', {
  crawlSessionId: session.id,
});

// Complete session
await client.completeSession(session.id, 'COMPLETED');

Direct Prisma Queries

Access the underlying Prisma client for custom queries:

const client = createPlasmaPrismaClient();

// Get content with high compression
const efficient = await client.db.webContent.findMany({
  where: {
    compressionRatio: { gte: 20 },
  },
  orderBy: { compressionRatio: 'desc' },
  take: 10,
});

// Find pages with specific links
const pages = await client.db.webContent.findMany({
  where: {
    outboundLinks: {
      some: {
        href: { contains: 'github.com' },
      },
    },
  },
  include: {
    outboundLinks: true,
  },
});

Schema Helpers

Generate schema programmatically:

import { generateSchema, PostgresFullTextIndex } from 'prisma-plasmate';

// Generate complete schema
const schema = generateSchema({
  provider: 'postgresql',
  includeLinks: true,
  includeSessions: true,
});

// Get PostgreSQL full-text search SQL
console.log(PostgresFullTextIndex);

Type Safety

All operations are fully typed:

import type {
  SOMResponse,
  FetchResult,
  SearchResult,
  ContentStats,
} from 'prisma-plasmate';

async function processContent(result: FetchResult) {
  console.log(result.somTokens); // number
  console.log(result.title);     // string | undefined
}

Token Compression

Plasmate converts HTML to a Semantic Object Model (SOM), reducing token usage by 10-100x:

const result = await client.fetchAndStore('https://docs.example.com/api');

console.log(`HTML tokens: ${result.htmlTokens}`);     // ~50,000
console.log(`SOM tokens: ${result.somTokens}`);       // ~2,500
console.log(`Compression: ${result.compressionRatio}x`); // 20x

This makes it practical to store and query web content for AI applications without exceeding context limits.

Full-Text Search

PostgreSQL

Enable PostgreSQL full-text search:

-- Add GIN index
CREATE INDEX web_content_text_search_idx
ON "WebContent"
USING GIN (to_tsvector('english', coalesce("textContent", '')));

SQLite

For SQLite, use FTS5:

import { SqliteFullTextIndex } from 'prisma-plasmate';

// Run the SQL to set up FTS
await prisma.$executeRawUnsafe(SqliteFullTextIndex);

API Reference

PlasmaPrismaClient

Method Description
fetchAndStore(url, options?) Fetch URL and store SOM
batchFetchAndStore(urls, options?) Batch fetch with concurrency
search(query, options?) Search stored content
getByUrl(url, sessionId?) Get content by URL
getSom(url, sessionId?) Get raw SOM for URL
createSession(name?, metadata?) Create crawl session
completeSession(id, status?) Mark session complete
getStats(sessionId?) Get token statistics
pruneOldContent(olderThan) Delete old content
disconnect() Close database connection

Prisma Extension ($plasmate)

Method Description
fetch(url, options?) Fetch and store URL
search(query, options?) Search stored content
getSom(url) Get SOM for URL
getStats() Get statistics
delete(url) Delete content by URL

License

MIT

About

Prisma ORM integration for Plasmate - type-safe web content storage and search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors