LogoLogo
  • 👋dat Documentation
  • OVERVIEW
    • About dat
    • dat Features
    • dat System Architecture
    • Core Concepts
      • Source
      • Generator
      • Destination
      • Structured and Unstructured data
      • Vector databases
      • Embeddings
      • Stream
    • Security
    • dat support
  • GETTING STARTED
    • Deploy dat open source
      • Quickstart
      • System Requirements
      • Using Docker Compose
    • Familiarizing with the UI
    • Create your first connection
      • Setting up a Source
      • Setting up a Generator
      • Setting up a Destination
  • INTEGRATIONS
    • Sources
      • Google Drive
      • Website Crawler Sitemap
      • Website Crawler
      • AWS Redshift
      • Postgres
      • Amazon S3
    • Generators
      • OpenAI
      • Cohere
    • Destinations
      • Pinecone
      • Qdrant
      • Weaviate
      • Milvus
  • PRODUCT UPDATES
    • Release Notes
    • Upcoming Features
  • COMMUNITY RESOURCES
    • Open Source Community
    • Contributing to dat
    • Code Of Conduct
    • GitHub Issues
  • DEVELOPER GUIDES
    • API Documentation
Powered by GitBook
On this page
  • Overview
  • Configuration Options
  • Supported streams

Was this helpful?

Edit on GitHub
  1. INTEGRATIONS
  2. Sources

Website Crawler

PreviousWebsite Crawler SitemapNextAWS Redshift

Last updated 9 months ago

Was this helpful?

Overview

The Website Crawler source connector allows you to extract data from Websites into your desired vector database. This helps you to automatically access the content contained in web pages and convert it into embeddings for loading into the vector database.

Configuration Options

  • Name: This field represents the name you want to assign to the actor instance responsible for managing the Website Crawler source. Choose a descriptive and unique name to easily identify this instance within your data activation tool (dat).

  • Site URL: Enter the Site URL. like .

Supported streams

The following streams are supported for this source:

  • url_crawler

https://www.example.com