LangChain pipelines that write developer brand copy from GitHub activity

title: 'LangChain pipelines that write developer brand copy from GitHub activity' slug: 'brandforce-ai-developer-branding' date: '2026-05-07' excerpt: 'BrandForce analyses your public commits, repos, and READMEs through a three-stage LangChain chain and produces positioning, bios, and talking points.' tags: ['AI', 'LangChain'] readTime: 8#

Why Developer Branding Is Hard to Write#

Most developers struggle to describe their own work compellingly. They know what they built, but translating that into positioning copy — the kind that resonates with hiring managers, clients, or conference committees — requires a different mode of thinking. BrandForce automates that translation by treating your GitHub activity as a structured dataset and running it through a three-stage LangChain pipeline.

Stage 1: Activity Ingestion#

The pipeline begins by pulling structured data from the GitHub REST API:

Repositories: name, description, topics, language, star count, fork count, last updated
Recent commits: message, files changed, diff summary (truncated to 200 chars per file)
README content: first 500 words of each pinned repo's README

We use LangChain's GithubLoader as a starting point, then layer custom extraction logic to produce a normalized DeveloperActivity object:

interface DeveloperActivity {
  topLanguages: string[]; // ranked by commit frequency
  domainKeywords: string[]; // extracted from README headers and repo topics
  recentCommitSummaries: string[]; // last 20 commit messages, cleaned
  projectDescriptions: string[]; // repo descriptions, non-empty only
  contributionPattern: 'solo' | 'collaborative' | 'maintainer';
}

The GitHub API returns paginated results. We use Promise.all with rate-limit backoff to fetch up to 100 repositories in parallel, then aggregate into the DeveloperActivity structure. This stage runs once per user session and caches results in Supabase for 24 hours.

Stage 2: Pattern Analysis#

Raw GitHub data is noisy. A developer might have 50 repos but 40 are tutorials and forks. Stage 2 uses an LLM (Claude Sonnet) with a structured output schema to extract signal from noise:

const analysisChain = new LLMChain({
  llm: new ChatAnthropic({ model: 'claude-sonnet-4-5' }),
  prompt: PromptTemplate.fromTemplate(`
    You are analyzing a developer's GitHub activity to identify their core identity.
    Activity data: {activity}

    Extract:
    1. Primary technical domain (e.g., "backend distributed systems", "frontend animation")
    2. Signature technologies (max 5, ranked by depth of use)
    3. Problem-solving style (e.g., "builds tools for other developers", "ships production SaaS")
    4. Differentiator (what makes this developer's portfolio unusual)

    Respond in JSON matching this schema: {schema}
  `),
  outputParser: new StructuredOutputParser({ schema: developerProfileSchema }),
});

The structured output parser retries up to 3 times if the LLM response doesn't match the schema. This stage takes 2–4 seconds and produces a DeveloperProfile object that feeds Stage 3.

Stage 3: Copy Generation#

The final stage generates four types of brand copy from the DeveloperProfile:

One-liner positioning: 15 words max, no jargon, lead with impact
Short bio: 80 words, third person, suitable for conference speaker profiles
LinkedIn summary: 200 words, first person, SEO-aware
Three talking points: bullet format, for podcast or interview prep

Each output type uses a separate prompt template tuned for its format constraints. We chain them with RunnableSequence so the one-liner always generates first and is referenced in subsequent prompts — ensuring consistency in voice across all outputs.

const copyPipeline = RunnableSequence.from([
  analysisChain,
  RunnableParallel.from({
    oneLiner: oneLinerChain,
    shortBio: shortBioChain,
    linkedInSummary: linkedInChain,
    talkingPoints: talkingPointsChain,
  }),
]);

The parallel execution in Stage 3 cuts total latency from ~12 seconds (sequential) to ~4 seconds (parallel), since each copy type is independent once the profile is analyzed.

Supabase Edge Functions for the API Layer#

BrandForce's backend runs entirely on Supabase Edge Functions — no persistent server, no cold-start penalty beyond the first invocation. The function handles GitHub OAuth, caches the DeveloperActivity result, and streams the Stage 3 output back to the browser using the Vercel AI SDK's StreamingTextResponse.

Streaming matters here because copy generation takes 3–6 seconds. Rather than showing a spinner and dumping all output at once, the browser renders each section as it arrives — one-liner first, then bio, then LinkedIn, then talking points. This progressive reveal makes the wait feel shorter and lets users start reviewing early outputs while later ones generate.

The edge function enforces a rate limit of 5 analyses per user per day to control OpenAI and Anthropic API costs. Usage is tracked in a Supabase table with a row-level security policy ensuring users can only see their own quota.

Why Developer Branding Is Hard to Write#

Stage 1: Activity Ingestion#

Stage 2: Pattern Analysis#

Stage 3: Copy Generation#

Supabase Edge Functions for the API Layer#

Angular Standalone Components: The Complete Guide