LogoLogo
  • 👋dat Documentation
  • OVERVIEW
    • About dat
    • dat Features
    • dat System Architecture
    • Core Concepts
      • Source
      • Generator
      • Destination
      • Structured and Unstructured data
      • Vector databases
      • Embeddings
      • Stream
    • Security
    • dat support
  • GETTING STARTED
    • Deploy dat open source
      • Quickstart
      • System Requirements
      • Using Docker Compose
    • Familiarizing with the UI
    • Create your first connection
      • Setting up a Source
      • Setting up a Generator
      • Setting up a Destination
  • INTEGRATIONS
    • Sources
      • Google Drive
      • Website Crawler Sitemap
      • Website Crawler
      • AWS Redshift
      • Postgres
      • Amazon S3
    • Generators
      • OpenAI
      • Cohere
    • Destinations
      • Pinecone
      • Qdrant
      • Weaviate
      • Milvus
  • PRODUCT UPDATES
    • Release Notes
    • Upcoming Features
  • COMMUNITY RESOURCES
    • Open Source Community
    • Contributing to dat
    • Code Of Conduct
    • GitHub Issues
  • DEVELOPER GUIDES
    • API Documentation
Powered by GitBook
On this page
  • Overview
  • Configuration Options
  • Supported streams

Was this helpful?

Edit on GitHub
  1. INTEGRATIONS
  2. Sources

Website Crawler Sitemap

PreviousGoogle DriveNextWebsite Crawler

Last updated 9 months ago

Was this helpful?

Overview

The Website Crawler Sitemap source connector allows you to extract site structure and visual relationships between pages from Websites into your desired vector database.

Configuration Options

  • Name: This field represents the name you want to assign to the actor instance responsible for managing the Website Crawler Sitemap source. Choose a descriptive and unique name to easily identify this instance within your data activation tool (dat).

  • Site URL: Enter the Site URL. like .

  • Sitemap URL: If available, enter the path to the sitemap. If left untouched, sitemap will be attempted to read from .

Supported streams

The following streams are supported for this source:

  • crawler_sitemap

https://www.example.com
https://www.example.com/sitemap