Chronon is an open-source feature platform that enables ML teams to rapidly build consistent ML features across training and serving environments at scale.
Eliminate data leakage with guaranteed point-in-time correctness. Generate accurate training datasets using state-of-the-art aggregation algorithms—no more expensive log-and-wait pipelines or costly retraining cycles.
Serve features with sub-10ms p99 latency using battle-tested Vert.x infrastructure. Deploy in embedded mode for minimal overhead or standalone for independent scaling—your choice, zero code changes.
Write features once. The same declarative definitions automatically power both batch training datasets and real-time serving endpoints—eliminating training-serving skew and the bugs that come with maintaining duplicate implementations.
Express complex temporal aggregations in simple, declarative Python. One unified API works across batch (Spark), streaming (Flink), and serving contexts—no need to learn multiple frameworks or translate logic between execution engines.
Scalable batch processing for historical features
Streaming aggregations for real-time feature updates
Bring your own KV store (Redis, DynamoDB, etc.)
High-throughput serving with flexible deployment
source = Source(
events=EventSource(
table="data.purchases",
topic="events/purchases",
query=Query(
selects=select(user="user_id", price="purchase_price"),
time_column="ts"
)
)
)
feature_group = GroupBy(
sources=[source],
keys=["user_id"],
aggregations=[
Aggregation(
input_column="price",
operation=Operation.SUM,
windows=[Window(3, TimeUnit.DAYS), Window(14, TimeUnit.DAYS)]
),
Aggregation(
input_column="price",
operation=Operation.LAST_K(10),
)
],
online=True
)
training_set = Join(
left=EventSource(table="data.checkouts", ...),
right_parts=[JoinPart(group_by=purchases_v1)],
online=True
)