SEO & Discoverability Guide
How the wiki handles SEO and LLM discoverability
The wiki automatically generates SEO metadata for all public pages. This guide explains what happens behind the scenes, what you can control as an editor, and how LLM crawlers discover wiki content.
SEO description
Each page has an optional SEO Description field on the edit form (between the Content textarea and the Visibility selector). This short summary (up to 300 characters) is used in:
- The HTML
<meta name="description">tag - Open Graph (
og:description) tags for social sharing - The /llms.txt index for LLM crawlers
- The Article JSON-LD structured data
If you leave it blank, the wiki auto-generates a description from the first ~160 characters of your page content (with Markdown stripped). For most pages this works fine, but a hand-written summary is better for important public-facing pages.
Canonical URLs
Every HTML page includes a <link rel="canonical"> tag pointing to
its own URL. This tells search engines that the page is the
authoritative version of its content.
The raw Markdown endpoint (.md) also sends a Link HTTP header
pointing back to the HTML page as canonical, plus an
X-Robots-Tag: noindex header. This prevents search engines from
indexing the Markdown version as a duplicate while keeping it
accessible to LLM crawlers.
Structured data (JSON-LD)
Public pages include two types of JSON-LD structured data:
- BreadcrumbList — the directory path leading to the page, which can appear as breadcrumb trails in search results
- Article — includes the page title, description, publication and modification dates, and Free Law Project as the publisher
Private and internal pages do not include any structured data and
are marked with noindex to prevent search engine indexing.
What is sitemap.xml?
A sitemap is a file that lists all the pages on a website so that search engines like Google and Bing can find and index them. Without a sitemap, search engines have to discover pages by following links, which means some pages might be missed. The wiki automatically generates a sitemap at /sitemap.xml.
How it works:
- Only public pages and directories are included
- The entire directory ancestry must be public — a public page inside a private or internal directory is automatically excluded
- Pages and directories with Include in search engines? unchecked are excluded, along with all their children
- The sitemap is automatically marked
noindexby Django (search engines follow its links, not index the XML file itself)
Controlling sitemap inclusion:
Both pages and directories have an Include in search engines? checkbox on their edit forms. This setting is hierarchical — if you exclude a directory from the sitemap, all pages and subdirectories inside it are also excluded, regardless of their own setting. This works like the visibility system: a child cannot be more visible than its parent.
The root directory always appears in the sitemap and cannot be excluded.
What is llms.txt?
llms.txt is a standard file (like robots.txt) that helps AI assistants — such as ChatGPT, Claude, and other large language models — discover and understand your content. While a sitemap helps search engines, llms.txt is specifically designed for AI crawlers. The wiki serves this file at /llms.txt.
The file lists pages grouped by directory, with each entry linking
to the raw Markdown (.md) version of the page:
# FLP Wiki
> Free Law Project's wiki covering legal technology, open legal
> data, and organizational knowledge.
## Engineering
- [CI Pipeline](https://wiki.free.law/c/engineering/ci-pipeline.md): How our CI works
## Optional
- [Getting Started](https://wiki.free.law/c/help/getting-started-guide.md): Intro guide
Each entry uses the page's SEO description if set, or an
auto-extracted summary from the content. The llms.txt file itself
has X-Robots-Tag: noindex so search engines don't index it.
Controlling llms.txt inclusion:
Both pages and directories have a Share with AI assistants? setting with three options:
- Yes — the page appears in the main section of llms.txt
- On request — the page appears in the "Optional" section, signaling to AI assistants that this content is supplementary and should be fetched only when relevant
- No — the page does not appear in llms.txt at all
This setting is hierarchical, like the sitemap control. If a directory is set to "No", none of its pages or subdirectories will appear in llms.txt regardless of their own setting. If a directory is set to "On request", its children can be "On request" or "No" but not "Yes" — the most restrictive value in the chain always wins.
The default for new pages and directories is No. Change it to "Yes" or "On request" on content you want AI assistants to find.
robots.txt
The /robots.txt file tells search engine crawlers what they can and cannot access:
Allowed:
/c/— all wiki content pages and directories/llms.txt— the LLM content index- Raw Markdown (
.md) files — for LLM crawlers
Blocked:
/admin/,/api/,/u/,/search/,/files/,/unsubscribe/,/activity/- Page action URLs: edit, move, delete, history, diff, revert, permissions, subscribe, pin, backlinks
- Directory action URLs: new, new-dir, edit-dir, move-dir, delete-dir, history-dir, etc.
- Comments, proposals, and feedback URLs
The robots.txt also includes a Sitemap: directive pointing
crawlers to the sitemap.
What editors should know
- Set SEO descriptions on important public pages — a concise, hand-written summary outperforms auto-generated ones
- Visibility matters — only public pages in fully-public
directory chains appear in the sitemap and llms.txt. Private and
internal pages are automatically excluded and marked
noindex - Discoverability settings are hierarchical — excluding a directory removes the entire subtree. You cannot include a child if a parent is excluded. Settings apply to subdirectories and sub-pages unless overridden
- You don't need to do anything for canonical URLs, JSON-LD, or robots.txt — these are all automatic
- Page titles are used as the
og:titleand Article headline, so write clear, descriptive titles