Canso - ML Platform
  • 👋Introduction
  • 🏛️Canso Architecture
  • 💻Getting Started
    • 🏁Overview
    • 🌌Provison K8s Clusters
    • 🚢Install Canso Helm Charts
    • 🐍🔗 Canso Python Client & Web App
    • 📊Health Metrics for Features in the Data Plane
  • 💡Feature Store
    • Data Sources
      • Data Spans
    • Data Sinks
    • ML Features
      • Raw ML Batch Feature
      • Derived ML Batch Feature
      • Raw ML Streaming Feature
      • Custom User Defined Function
  • 💡AI Agents
    • Introduction
    • Getting Started
    • Quickstart
    • Use Cases
      • Fraud Analyst Agent
      • Agent with Memory
      • Memory command examples
    • Concepts
      • Task Server
      • Broker
      • Checkpoint DB
      • Conversation History
      • Memory
    • How Tos
      • Update the AI Agent
      • Delete the AI Agent
    • Toolkit
      • SQL Runner
      • Kubernetes Job
      • Text-to-SQL
    • API Documentation
      • Agent
      • Memory
  • 💡Risk
    • Overview
    • Workflows and Rules
    • Real Time Transaction Monitoring
    • API Documentation
  • 💡Fraud Investigation
    • API Documentation
  • 📝Guides
    • Registry
    • Dry Runs for Batch ML Features
    • Deployment
Powered by GitBook
On this page
  • Introduction
  • Raw Feature Types
  • Feature with Predefined Logic
  • Feature with Custom UDF
  • Raw Feature Attributes
  • Special notes on attributes
  • Example Object of Raw Features
  • Working with Raw Features
  • Tool Tips

Was this helpful?

  1. 💡Feature Store
  2. ML Features

Raw ML Batch Feature

A Raw Feature in the Canso platform is a fundamental component that executes ML pipelines. It processes data from registered data sources, applies feature logic, and stores the results in data sinks.

Introduction

A Raw Features are the essential building blocks for machine learning models. It provides:

  • Ensures seamless preparation of data for both training and inference stages.

  • Performs standardized predefined logics or user-defined functions (UDFs).

Raw Feature Types

Feature with Predefined Logic

Raw Features with predefined logic utilize built-in transformations and aggregations for ease of use and consistency. Features created using common aggregations like window or sliding window and transformations such as SUM, MIN, MAX, etc.

Feature with Custom UDF

Features created using user-defined functions for more complex and specific transformations. Raw Features with custom UDFs allow for more flexibility and can handle complex transformations not covered by predefined logic.

Raw Feature Attributes

Attribute
Description
Example

name

Unique Name of the raw feature

user_clicks_7d

description

Description of what the raw feature contains

Sum of user clicks in the last 7 days

owners

Team that owns the raw feature

['data_team@company.com']

entity

The entity the feature is based on

user_id

data_type

Data type of the raw feature

FLOAT

data_sources

List of data sources used to create the feature

[survey_telemetry_data]

staging_sink

S3 sink where the intermediate data is stored

[operational_telemetry_data]

online_sink

Redis sink for storing the feature online

["online_telemetry_data"]

online_sink_write_option_configs

Configurations for writing to the online sink

{"online_telemetry_data":{ "file_type_properties": { "type": "PARQUET", "mergeSchema": False}}}

feature_logic

Transformation logic to compute the feature

SlidingWindowAggregation

processing_engine

Processing engine used for feature computation

spark

processing_engine_configs

Configurations for the processing engine

{"memory": "4g", "cores": 2}

online

Flag to indicate if the feature should be available online

True

offline

Flag to indicate if the feature should be available offline

True

schedule

Schedule for feature computation

1D

active

Flag to indicate if the feature is active

True

start_time

Start time for feature computation

datetime.now()

Special notes on attributes

  • feature_logic: The Raw Feature supports sliding windows and window aggregations.

  • online flag: If the online flag is enabled, the data will be ingested into the online sink (i.e., Redis cache).

  • offline flag: If the offline flag is enabled, the Raw Feature will ingest the data into the offline sink (i.e., S3 sink).

  • Read & Write option configs: Users can provide some configurations at the time of feature registration. Currently, this option is not enabled, but we will be adding support for them soon.

Example Object of Raw Features

Working with Raw Features

Once the raw feature is defined:

  • It can be registered and reused by Derived Features. The output of a Raw Feature becomes the input for a Derived Feature, enabling more complex feature engineering and reducing redundancy.

  • It can be deployed to execute the defined operation.

Tool Tips

PreviousML FeaturesNextDerived ML Batch Feature

Last updated 11 months ago

Was this helpful?

processing_engine & their configs: These are the used to run the Raw Feature.

Go to ⬆️

Go back to ⬅️

Go back to ⬅️

Go back to ⬅️

Move forward to see ➡️

default set of PySpark configurations
Feature with Predefined logic
Feature with Custom UDF
Top
README.md
data-sources.md
data-sinks.md
register-feature.md