LogoLogo
  • 👋dat Documentation
  • OVERVIEW
    • About dat
    • dat Features
    • dat System Architecture
    • Core Concepts
      • Source
      • Generator
      • Destination
      • Structured and Unstructured data
      • Vector databases
      • Embeddings
      • Stream
    • Security
    • dat support
  • GETTING STARTED
    • Deploy dat open source
      • Quickstart
      • System Requirements
      • Using Docker Compose
    • Familiarizing with the UI
    • Create your first connection
      • Setting up a Source
      • Setting up a Generator
      • Setting up a Destination
  • INTEGRATIONS
    • Sources
      • Google Drive
      • Website Crawler Sitemap
      • Website Crawler
      • AWS Redshift
      • Postgres
      • Amazon S3
    • Generators
      • OpenAI
      • Cohere
    • Destinations
      • Pinecone
      • Qdrant
      • Weaviate
      • Milvus
  • PRODUCT UPDATES
    • Release Notes
    • Upcoming Features
  • COMMUNITY RESOURCES
    • Open Source Community
    • Contributing to dat
    • Code Of Conduct
    • GitHub Issues
  • DEVELOPER GUIDES
    • API Documentation
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. OVERVIEW
  2. Core Concepts

Generator

High-dimensional vectors are created by converting data points into numerical forms within a high-dimensional space. A generator is a tool or model that produces these vector embeddings. These embeddings capture key features and relationships of the data, making them valuable for various machine learning applications.

Key Components of an Embedding Generator

  1. Data Input: The generator takes in raw data, such as text, images, or other types of data.

  2. Feature Extraction: It identifies and extracts relevant features from the input data.

  3. Model Training: The generator uses a machine learning model, such as a neural network, to learn the relationships and patterns in the data.

  4. Vector Representation: After training, the model can transform new data points into vector embeddings.

Some examples are Cohere, OpenAI etc.

When selecting a generator, consider factors such as data nature, complexity, computational ability, cost, task requirements, and overfitting risk. Choosing dimensionality involves balancing the need for detailed information with computational efficiency. Higher dimensions capture more details but may increase overfitting and computational costs.

PreviousSourceNextDestination

Last updated 9 months ago

Was this helpful?