Content Archaeologist (voice mining)
Mines transcripts of your existing reels, podcasts, and talks to surface your voice fingerprint, recurring themes, signature hooks, and repurposing ideas. Runs upstream of every other writing agent.
An analyst, not a writer. Reads transcripts of your existing content (reels, livestreams, podcasts, interviews) and surfaces the patterns that make your voice yours. Outputs a brand voice fingerprint with quoted examples, a theme map ranked by frequency, your top 10-15 signature hooks pulled verbatim from the corpus, an inference of who your audience actually is, and a repurposing pack mapping existing chunks to new formats. Pin the voice fingerprint into Ghost Writer / Content Creator and watch their voice match jump.
Built for
Under the hood
Primary model
anthropic/claude-sonnet-4.6
Auxiliary models
Vector store
none
Multimodal
text
What it ships with
- Voice fingerprint with quoted patterns from the corpus
- Theme map ranked by frequency with verbatim takes
- Signature hooks pulled directly from prior content
- Audience inference (what the corpus reveals vs stated ICP)
- Repurposing pack — concrete reformat suggestions tied to source
- Voice drift detection across time
- Plays well upstream of Ghost Writer + Content Creator (improves voice match)
- Phase 1: customer uploads .md/.txt transcripts (free transcription tools handle the audio→text step)
- Phase 1.5 (planned): public URL paste mode + yt-dlp + Whisper transcription
- Phase 2 (planned): Instagram Graph API OAuth for own posts (free, official, no TOS risk)
Primary responsibilities
- 01Voice fingerprinting from corpus
- 02Theme extraction with frequency ranking
- 03Signature hook surfacing
- 04Repurposing recommendations
Secondary responsibilities
- Voice drift alerts
- Audience inference vs stated ICP
Workflows
- Loop 1
Upload 5-20 transcripts → run → review fingerprint
- Loop 2
Pin voice fingerprint into Ghost Writer / Content Creator pinned notes
- Loop 3
Re-run quarterly as new content accumulates
How we measure it
- Voice-match score on downstream Ghost Writer / Content Creator output
- Repurposing-pack utilization rate (what % of suggestions get acted on)
Integrations
Tools this agent connects to. OAuth scopes are minimum-necessary by default.
Data sources
Information this agent reads at runtime. All scoped to your organization.
Compliance
ROI
How the math works
Replaces a brand voice consultant ($3-10k for a one-time audit). Continuous re-mining as content accumulates. Improves the output quality of every other writing agent on the dashboard, so it pays for itself by raising Ghost Writer + Content Creator effectiveness.
Human equivalent: Brand voice consultant ($3-10k for a one-time fingerprint engagement). The agent is continuous and re-mines as the corpus grows.
Risks & mitigations
What could go wrong
- Phase 1 requires the customer to bring transcripts (free tools like Otter.ai / CapCut auto-captions handle this). Customers without prior content can't use this agent yet.
- Phase 3 (saved-content auto-pull via Playwright) carries Meta TOS risk — opt-in only, with explicit account-flag warning.
Tags
Ready to put Content Archaeologist (voice mining) to work?
More from Content Studio
Browse all →Ghost Writer (Long-form Author)
Content Studio
Professional-grade book and long-form ghostwriter that produces work indistinguishable from human authorship. AI-detection-aware, voice-faithful, structurally sophisticated.
From
$997/mo retainer
Content Creator (Short-form + Direct Response)
Content Studio
Short-form copy that converts: social captions across platforms, hooks, ad variants, email subject lines, reel scripts, tweet drafts. Distinct from Ghost Writer (long-form essays and books).
From
$297/mo
Translator / Localization Lead
Content Studio
Multi-language content production with cultural localization, not just translation. Maintains brand voice across languages.
From
$197/mo