SEO & Discoverability Guide

How the wiki handles SEO and LLM discoverability

The wiki automatically generates SEO metadata for all public pages. This guide explains what happens behind the scenes, what you can control as an editor, and how LLM crawlers discover wiki content.

SEO description

Each page has an optional SEO Description field on the edit form (between the Content textarea and the Visibility selector). This short summary (up to 300 characters) is used in:

The HTML <meta name="description"> tag
Open Graph (og:description) tags for social sharing
The /llms.txt index for LLM crawlers
The Article JSON-LD structured data

If you leave it blank, the wiki auto-generates a description from the first ~160 characters of your page content (with Markdown stripped). For most pages this works fine, but a hand-written summary is better for important public-facing pages.

Canonical URLs

Every HTML page includes a <link rel="canonical"> tag pointing to its own URL. This tells search engines that the page is the authoritative version of its content.

The raw Markdown endpoint (.md) also sends a Link HTTP header pointing back to the HTML page as canonical, plus an X-Robots-Tag: noindex header. This prevents search engines from indexing the Markdown version as a duplicate while keeping it accessible to LLM crawlers.

Structured data (JSON-LD)

Public pages include two types of JSON-LD structured data:

BreadcrumbList — the directory path leading to the page, which can appear as breadcrumb trails in search results
Article — includes the page title, description, publication and modification dates, and Free Law Project as the publisher

Private and internal pages do not include any structured data and are marked with noindex to prevent search engine indexing.

What is sitemap.xml?

A sitemap is a file that lists all the pages on a website so that search engines like Google and Bing can find and index them. Without a sitemap, search engines have to discover pages by following links, which means some pages might be missed. The wiki automatically generates a sitemap at /sitemap.xml.

How it works:

Only public pages and directories are included
The entire directory ancestry must be public — a public page inside a private or internal directory is automatically excluded
Pages and directories with Include in search engines? unchecked are excluded, along with all their children
The sitemap is automatically marked noindex by Django (search engines follow its links, not index the XML file itself)

Controlling sitemap inclusion:

Both pages and directories have an Include in search engines? checkbox on their edit forms. This setting is hierarchical — if you exclude a directory from the sitemap, all pages and subdirectories inside it are also excluded, regardless of their own setting. This works like the visibility system: a child cannot be more visible than its parent.

The root directory always appears in the sitemap and cannot be excluded.

What is llms.txt?

llms.txt is a standard file (like robots.txt) that helps AI assistants — such as ChatGPT, Claude, and other large language models — discover and understand your content. While a sitemap helps search engines, llms.txt is specifically designed for AI crawlers. The wiki serves this file at /llms.txt.

The file lists pages grouped by directory, with each entry linking to the raw Markdown (.md) version of the page:

# FLP Wiki

> Free Law Project's wiki covering legal technology, open legal
> data, and organizational knowledge.

## Engineering

- [CI Pipeline](https://wiki.free.law/c/engineering/ci-pipeline.md): How our CI works

## Optional

- [Getting Started](https://wiki.free.law/c/help/getting-started-guide.md): Intro guide

Each entry uses the page's SEO description if set, or an auto-extracted summary from the content. The llms.txt file itself has X-Robots-Tag: noindex so search engines don't index it.

Controlling llms.txt inclusion:

Both pages and directories have a Share with AI assistants? setting with three options:

Yes — the page appears in the main section of llms.txt
On request — the page appears in the "Optional" section, signaling to AI assistants that this content is supplementary and should be fetched only when relevant
No — the page does not appear in llms.txt at all

This setting is hierarchical, like the sitemap control. If a directory is set to "No", none of its pages or subdirectories will appear in llms.txt regardless of their own setting. If a directory is set to "On request", its children can be "On request" or "No" but not "Yes" — the most restrictive value in the chain always wins.

The default for new pages and directories is No. Change it to "Yes" or "On request" on content you want AI assistants to find.

robots.txt

The /robots.txt file tells search engine crawlers what they can and cannot access:

Allowed:

/c/ — all wiki content pages and directories
/llms.txt — the LLM content index
Raw Markdown (.md) files — for LLM crawlers

Blocked:

/admin/, /api/, /u/, /search/, /files/, /unsubscribe/, /activity/
Page action URLs: edit, move, delete, history, diff, revert, permissions, subscribe, pin, backlinks
Directory action URLs: new, new-dir, edit-dir, move-dir, delete-dir, history-dir, etc.
Comments, proposals, and feedback URLs

The robots.txt also includes a Sitemap: directive pointing crawlers to the sitemap.

What editors should know

Set SEO descriptions on important public pages — a concise, hand-written summary outperforms auto-generated ones
Visibility matters — only public pages in fully-public directory chains appear in the sitemap and llms.txt. Private and internal pages are automatically excluded and marked noindex
Discoverability settings are hierarchical — excluding a directory removes the entire subtree. You cannot include a child if a parent is excluded. Settings apply to subdirectories and sub-pages unless overridden
You don't need to do anything for canonical URLs, JSON-LD, or robots.txt — these are all automatic
Page titles are used as the og:title and Article headline, so write clear, descriptive titles