Content Extraction
contentExtraction controls how OpenNav reads built HTML pages before creating
generated Markdown page artifacts and llms-full.txt.
OpenNav is conservative by default. If you omit contentExtraction, or set
stripLayout to false, OpenNav converts the whole HTML <body> to Markdown.
That preserves unusual page structures and avoids dropping content from sites
that use layout tags in custom ways.
Use stripLayout only when your built HTML uses normal layout elements for
repeated page interface such as navigation, sidebars, search, headers, footers,
or table-of-contents panels.
interface OpenNavContentExtractionOptions { readonly stripLayout?: boolean;}stripLayout
Section titled “stripLayout”Optional. Defaults to false.
When stripLayout is true, OpenNav still starts from the whole HTML <body>.
It does not choose a <main>, <article>, or custom content root. Before
converting that body to Markdown, it removes only the fixed layout elements
listed below.
OpenNav strips these elements when stripLayout is enabled:
| Element or selector | Why it is removed |
|---|---|
<nav> | Site navigation, side navigation, and table-of-contents navigation. |
<aside> | Sidebars, complementary panels, and generated table-of-contents blocks. |
<header> | Repeated page or site headers. |
<footer> | Repeated page or site footers, including previous/next page navigation. |
<search> | HTML search widgets. |
<site-search> | Starlight-style custom search widgets. |
[role="navigation"] | ARIA navigation landmarks not expressed with <nav>. |
[role="search"] | Search widgets not expressed with <search>. |
[role="complementary"] | Sidebar-like complementary landmarks. |
[data-pagefind-ignore] | Content already marked to be ignored by page indexing. |
| Skip links | Links whose href starts with # and whose visible text starts with skip to. |
OpenNav always excludes technical non-readable elements such as <head>,
<meta>, <script>, <style>, and <title> from generated Markdown. That
behavior does not require stripLayout.
When To Leave It Off
Section titled “When To Leave It Off”Leave stripLayout unset or false when your pages use <header>, <footer>,
<aside>, or navigation landmarks for core article content, examples, legal
copy, API reference content, or other text agents should read.
The first version intentionally does not accept custom selector arrays. Future
releases are expected to add more granular controls, such as tag-level,
class-level, or selector-level strip and preserve rules, without changing the
top-level contentExtraction object.
Examples
Section titled “Examples”SDK, Astro, and Next use the same option shape:
contentExtraction: { stripLayout: true,}The CLI flag maps to the same setting:
opennav build --static \ --output dist \ --site-url https://example.com \ --site-name "Example Docs" \ --strip-layout