Chronon - Open-Source Feature Platform for Machine Learning

Point-in-Time Correctness

Eliminate data leakage with guaranteed point-in-time correctness. Generate accurate training datasets using state-of-the-art aggregation algorithms—no more expensive log-and-wait pipelines or costly retraining cycles.

Production-Grade Performance

Serve features with sub-10ms p99 latency using battle-tested Vert.x infrastructure. Deploy in embedded mode for minimal overhead or standalone mode for independent scaling—your choice, zero code changes.

Guaranteed Consistency

Write features once. The same declarative definitions automatically power both batch training datasets and real-time serving endpoints—eliminating training-serving skew and the bugs that come with maintaining duplicate implementations.

Developer-Friendly API

Express complex temporal aggregations in simple, declarative Python. One unified API works across batch (Spark), streaming (Flink or Spark streaming), and serving contexts—no need to learn multiple frameworks or translate logic between execution engines.

Enterprise-Ready Technology Stack

Apache Spark

Scalable batch processing for historical features

Apache Flink

Streaming aggregations for real-time feature updates

Flexible Storage

Bring your own KV store (Redis, DynamoDB, etc.)

Vert.x Serving Layer

High-throughput serving with flexible deployment

Features Defined in Code, Not Configuration

Connect any data source—event streams, database tables, or external APIs—with a simple, type-safe configuration.


source = Source(
    events=EventSource(
        table="data.purchases",
        topic="events/purchases",
        query=Query(
            selects=select(user="user_id", price="purchase_price"),
            time_column="ts"
        )
    )
)

Define time-windowed aggregations that automatically compute both training data and serve real-time results—schema inference included.


feature_group = GroupBy(
    sources=[source],
    keys=["user_id"],
    aggregations=[
        Aggregation(
            input_column="price",
            operation=Operation.SUM,
            windows=[Window(3, TimeUnit.DAYS), Window(14, TimeUnit.DAYS)]
        ),
        Aggregation(
            input_column="price",
            operation=Operation.LAST_K(10),
        )
    ],
    online=True
)

Join multiple feature groups into training sets with guaranteed point-in-time correctness— preventing data leakage automatically.


training_set = Join(
    left=EventSource(table="data.checkouts", ...),
    right_parts=[JoinPart(group_by=purchases_v1)],
    online=True
)

End-to-End Feature Platform for Machine Learning

Point-in-Time Correctness

Production-Grade Performance

Guaranteed Consistency

Developer-Friendly API

Enterprise-Ready Technology Stack

Apache Spark

Apache Flink

Flexible Storage

Vert.x Serving Layer

Features Defined in Code, Not Configuration

Community & Resources

Chronon — A Declarative Feature Engineering Framework

Shepherd: Stripe's next-generation machine learning feature engineering platform

Chronon: Airbnb's Open-Source Data Platform

Building Generative Recommenders with Chronon

Democratizing High-Performance AI/ML Feature Engineering Through Open Source

Proven in Production at Industry-Leading ML Teams

Point-in-Time Correctness

Production-Grade Performance

Guaranteed Consistency

Developer-Friendly API

Enterprise-Ready Technology Stack

Apache Spark

Apache Flink

Flexible Storage

Vert.x Serving Layer

Features Defined in Code, Not Configuration

Community & Resources

Chronon — A Declarative Feature Engineering Framework

Shepherd: Stripe's next-generation machine learning feature engineering platform

Chronon: Airbnb's Open-Source Data Platform

Building Generative Recommenders with Chronon

Democratizing High-Performance AI/ML Feature Engineering Through Open Source