Sitemap and robots.txt

Blogatto can generate a sitemap XML file and a robots.txt file for search engine optimization.

Sitemap

The sitemap includes all static routes and blog post URLs.

Basic setup

import blogatto/config
import blogatto/config/sitemap
import gleam/option.{None}

let sitemap_config =
  sitemap.new("/sitemap.xml")

let cfg =
  config.new("https://example.com")
  |> config.sitemap(sitemap_config)

This generates dist/sitemap.xml with entries for every static route and blog post.

SitemapConfig fields

Field Type Description
path String Output path relative to output_dir
filter Option(fn(String) -> Bool) Include/exclude routes by URL
serialize Option(fn(String) -> SitemapEntry) Custom entry serialization

Filtering routes

Exclude specific routes from the sitemap:

import gleam/string

let sitemap_config =
  sitemap.new("/sitemap.xml")
  |> sitemap.filter(fn(url) {
    // Exclude draft pages
    !string.contains(url, "/draft")
  })

Custom serialization

Control the priority, change frequency, and last-modified date for each entry:

import blogatto/config/sitemap.{Monthly, Weekly}
import gleam/option.{None, Some}
import gleam/string

let sitemap_config =
  sitemap.new("/sitemap.xml")
  |> sitemap.serialize(fn(url) {
    let #(priority, freq) = case string.contains(url, "/blog/") {
      True -> #(0.7, Some(Weekly))
      False -> #(1.0, Some(Monthly))
    }
    sitemap.SitemapEntry(
      url: url,
      priority: Some(priority),
      last_modified: None,
      change_frequency: freq,
    )
  })

SitemapEntry fields

Field Type Description
url String The full URL for this entry
priority Option(Float) Priority hint (0.0 to 1.0)
last_modified Option(Timestamp) Last modification date
change_frequency Option(ChangeFrequency) How often the page changes

ChangeFrequency values

Value Description
Always Changes every access
Hourly Changes approximately every hour
Daily Changes approximately every day
Weekly Changes approximately every week
Monthly Changes approximately every month
Yearly Changes approximately every year
Never Archived, will not change

Robots.txt

The robots.txt file tells search engine crawlers which pages to index.

Basic setup

import blogatto/config
import blogatto/config/robots

let robots_config =
  robots.new("https://example.com/sitemap.xml")
  |> robots.robot(robots.Robot(
    user_agent: "*",
    allowed_routes: ["/"],
    disallowed_routes: [],
  ))

let cfg =
  config.new("https://example.com")
  |> config.robots(robots_config)

This generates dist/robots.txt:

Sitemap: https://example.com/sitemap.xml

User-agent: *
Allow: /

Multiple user agents

Add different policies for different crawlers:

let robots_config =
  robots.new("https://example.com/sitemap.xml")
  |> robots.robot(robots.Robot(
    user_agent: "*",
    allowed_routes: ["/"],
    disallowed_routes: ["/admin/"],
  ))
  |> robots.robot(robots.Robot(
    user_agent: "Googlebot",
    allowed_routes: ["/"],
    disallowed_routes: [],
  ))

RobotsConfig fields

Field Type Description
sitemap_url String Full URL to the sitemap
robots List(Robot) Crawl policies per user agent

Robot fields

Field Type Description
user_agent String Crawler name ("*" for all)
allowed_routes List(String) Paths the crawler may access
disallowed_routes List(String) Paths the crawler must not access

Combining sitemap and robots.txt

A typical SEO setup uses both together, with the robots.txt pointing to the sitemap:

import blogatto/config
import blogatto/config/robots
import blogatto/config/sitemap
import gleam/option.{None}

let site_url = "https://example.com"

let sitemap_config =
  sitemap.new("/sitemap.xml")

let robots_config =
  robots.new(site_url <> "/sitemap.xml")
  |> robots.robot(robots.Robot(
    user_agent: "*",
    allowed_routes: ["/"],
    disallowed_routes: [],
  ))

let cfg =
  config.new(site_url)
  |> config.sitemap(sitemap_config)
  |> config.robots(robots_config)