Canso - ML Platform
  • 👋Introduction
  • 🏛️Canso Architecture
  • 💻Getting Started
    • 🏁Overview
    • 🌌Provison K8s Clusters
    • 🚢Install Canso Helm Charts
    • 🐍🔗 Canso Python Client & Web App
    • 📊Health Metrics for Features in the Data Plane
  • 💡Feature Store
    • Data Sources
      • Data Spans
    • Data Sinks
    • ML Features
      • Raw ML Batch Feature
      • Derived ML Batch Feature
      • Raw ML Streaming Feature
      • Custom User Defined Function
  • 💡AI Agents
    • Introduction
    • Getting Started
    • Quickstart
    • Use Cases
      • Fraud Analyst Agent
      • Agent with Memory
      • Memory command examples
    • Concepts
      • Task Server
      • Broker
      • Checkpoint DB
      • Conversation History
      • Memory
    • How Tos
      • Update the AI Agent
      • Delete the AI Agent
    • Toolkit
      • SQL Runner
      • Kubernetes Job
      • Text-to-SQL
    • API Documentation
      • Agent
      • Memory
  • 💡Risk
    • Overview
    • Workflows and Rules
    • Real Time Transaction Monitoring
    • API Documentation
  • 💡Fraud Investigation
    • API Documentation
  • 📝Guides
    • Registry
    • Dry Runs for Batch ML Features
    • Deployment
Powered by GitBook
On this page
  • Introduction
  • Data Sink Types
  • Offline Data Sink Attributes (S3)
  • Online Data Sink Attributes (Redis)
  • Example object of Sinks
  • Working with Data Sinks
  • Tool Tips

Was this helpful?

  1. 💡Feature Store

Data Sinks

A Data Sink in Canso is a reference to where processed data is stored. Data Sinks can be used by multiple Batch and Streaming features to save their outputs. Currently, Canso supports two types of data sinks: S3 for offline storage and Redis for online storage.

Introduction

Data Sinks provide a standardized way to store processed data, enabling:

  • Efficient storage and retrieval of processed data.

  • Reusability across different batch and streaming features.

  • Simplified handling of data storage configurations for Data Scientists.

  • Consistent methods for saving feature and pre-processing table outputs.

Data Sink Types

Offline Data Sink Attributes (S3)

These attributes define how processed data is stored and accessed in Data Sinks for offline storage.

Attribute
Description
Example

name

Unique Name of the data sink

processed_sales_orders

description

Description of what the data sink contains

Processed sales orders stored for analysis

owner

Team that owns the data sink

['data_engg@yugen.ai', 'sales@yugen.ai']

bucket

Bucket Name

mycompany_processed_data

leading_key

Fixed Key Component

processed_txns/users

file_type

File Type

CSV, PARQUET

metadata

Additional metadata about the offline sink

{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}

Online Data Sink Attributes (Redis)

These attributes define the configuration and usage of sink for low-latency retrieval.

Attribute
Description
Example

name

Unique Name of the data sink

user_session_data

description

Description of what the data source contains

Real-time user session data for quick access

owner

Team that owns the data source

['data_engg@yugen.ai', 'session_mgmt@yugen.ai']

host

Redis Host

redis://192.168.1.123:6379

metadata

Additional metadata about the online sink

{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}

Example object of Sinks

Working with Data Sinks

Once a Data Sink is defined, it can be:

  • Registered for reusability across different teams.

  • Referenced by Batch and Streaming features to save their outputs.

Tool Tips

PreviousData SpansNextML Features

Last updated 11 months ago

Was this helpful?

Go to ⬆️

Go back to ⬅️

Go back to ⬅️

Move forward to see ➡️

Move forward to see ➡️

Move forward to see ➡️

S3 Sink
Redis Sink
Top
README.md
data-sources.md
raw-feature.md
derived-feature.md
streaming-feature.md