Data Sinks
A Data Sink in Canso is a reference to where processed data is stored. Data Sinks can be used by multiple Batch and Streaming features to save their outputs. Currently, Canso supports two types of data sinks: S3 for offline storage and Redis for online storage.
Introduction
Data Sinks provide a standardized way to store processed data, enabling:
Efficient storage and retrieval of processed data.
Reusability across different batch and streaming features.
Simplified handling of data storage configurations for Data Scientists.
Consistent methods for saving feature and pre-processing table outputs.
Data Sink Types
Offline Data Sink Attributes (S3)
These attributes define how processed data is stored and accessed in Data Sinks for offline storage.
name
Unique Name of the data sink
processed_sales_orders
description
Description of what the data sink contains
Processed sales orders stored for analysis
owner
Team that owns the data sink
['data_engg@yugen.ai', 'sales@yugen.ai']
bucket
Bucket Name
mycompany_processed_data
leading_key
Fixed Key Component
processed_txns/users
file_type
File Type
CSV
, PARQUET
metadata
Additional metadata about the offline sink
{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}
Online Data Sink Attributes (Redis)
These attributes define the configuration and usage of sink for low-latency retrieval.
name
Unique Name of the data sink
user_session_data
description
Description of what the data source contains
Real-time user session data for quick access
owner
Team that owns the data source
['data_engg@yugen.ai', 'session_mgmt@yugen.ai']
host
Redis Host
redis://192.168.1.123:6379
metadata
Additional metadata about the online sink
{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}
Example object of Sinks
Working with Data Sinks
Once a Data Sink is defined, it can be:
Registered for reusability across different teams.
Referenced by Batch and Streaming features to save their outputs.
Tool Tips
Go to Top ⬆️
Go back to README.md ⬅️
Go back to data-sources.md ⬅️
Move forward to see raw-feature.md ➡️
Move forward to see derived-feature.md ➡️
Move forward to see streaming-feature.md ➡️
Last updated
Was this helpful?