Content Studio·Beta

Content Archaeologist (voice mining)

Mines transcripts of your existing reels, podcasts, and talks to surface your voice fingerprint, recurring themes, signature hooks, and repurposing ideas. Runs upstream of every other writing agent.

An analyst, not a writer. Reads transcripts of your existing content (reels, livestreams, podcasts, interviews) and surfaces the patterns that make your voice yours. Outputs a brand voice fingerprint with quoted examples, a theme map ranked by frequency, your top 10-15 signature hooks pulled verbatim from the corpus, an inference of who your audience actually is, and a repurposing pack mapping existing chunks to new formats. Pin the voice fingerprint into Ghost Writer / Content Creator and watch their voice match jump.

Built for

Personal-brand operator with a year+ of contentFounder with podcast or livestream archiveCoach repurposing across formatsAgency onboarding a new creator client

Under the hood

Primary model

anthropic/claude-sonnet-4.6

Auxiliary models

Vector store

none

Multimodal

text

What it ships with

  • Voice fingerprint with quoted patterns from the corpus
  • Theme map ranked by frequency with verbatim takes
  • Signature hooks pulled directly from prior content
  • Audience inference (what the corpus reveals vs stated ICP)
  • Repurposing pack — concrete reformat suggestions tied to source
  • Voice drift detection across time
  • Plays well upstream of Ghost Writer + Content Creator (improves voice match)
  • Phase 1: customer uploads .md/.txt transcripts (free transcription tools handle the audio→text step)
  • Phase 1.5 (planned): public URL paste mode + yt-dlp + Whisper transcription
  • Phase 2 (planned): Instagram Graph API OAuth for own posts (free, official, no TOS risk)

Primary responsibilities

  1. 01Voice fingerprinting from corpus
  2. 02Theme extraction with frequency ranking
  3. 03Signature hook surfacing
  4. 04Repurposing recommendations

Secondary responsibilities

  • Voice drift alerts
  • Audience inference vs stated ICP

Workflows

  1. Loop 1

    Upload 5-20 transcripts → run → review fingerprint

  2. Loop 2

    Pin voice fingerprint into Ghost Writer / Content Creator pinned notes

  3. Loop 3

    Re-run quarterly as new content accumulates

How we measure it

  • Voice-match score on downstream Ghost Writer / Content Creator output
  • Repurposing-pack utilization rate (what % of suggestions get acted on)

Integrations

Tools this agent connects to. OAuth scopes are minimum-necessary by default.

Phase 1: customer-uploaded transcripts (.md / .txt)Phase 1.5 planned: yt-dlp on Vercel Sandbox + OpenAI Whisper transcriptionPhase 2 planned: Instagram Graph API OAuth (own posts only, free, official)Phase 3 (only if needed): Playwright on Vercel Sandbox for saved content (TOS risk)

Data sources

Information this agent reads at runtime. All scoped to your organization.

Customer-uploaded transcript corpusBrand DNA (cross-referenced for drift detection)

Compliance

No PII processed beyond what the customer uploads

ROI

How the math works

Replaces a brand voice consultant ($3-10k for a one-time audit). Continuous re-mining as content accumulates. Improves the output quality of every other writing agent on the dashboard, so it pays for itself by raising Ghost Writer + Content Creator effectiveness.

Human equivalent: Brand voice consultant ($3-10k for a one-time fingerprint engagement). The agent is continuous and re-mines as the corpus grows.

Risks & mitigations

What could go wrong

  • Phase 1 requires the customer to bring transcripts (free tools like Otter.ai / CapCut auto-captions handle this). Customers without prior content can't use this agent yet.
  • Phase 3 (saved-content auto-pull via Playwright) carries Meta TOS risk — opt-in only, with explicit account-flag warning.

Tags

#content-archaeologist#voice-mining#transcripts#repurposing#brand-voice

Ready to put Content Archaeologist (voice mining) to work?