Skip to main content

Command Palette

Search for a command to run...

Building Torvix: An Open-Source Cloud Cost Intelligence Platform

Updated
11 min read
Building Torvix: An Open-Source Cloud Cost Intelligence Platform
A
A developer who loves to code in Python on Linux, always curious about learning something new.

Cloud bills are strange.

They are extremely important, but most teams only look at them after the damage is already done. A cost spike happens today, the bill becomes obvious later, and by the time someone starts investigating, the question is no longer just “how much did we spend?” but:

Where did this cost come from? Which account, compartment, region, service, or resource caused it? Was this expected growth, a misconfiguration, or pure waste?

That gap is what pushed me to build Torvix.

Torvix is an open-source cloud cost intelligence platform focused on cloud cost visibility, anomaly detection, forecasting, and unused-resource detection. The goal is not to replace every enterprise FinOps product. The goal is to give DevOps, SRE, platform, and infrastructure teams a practical, self-hosted way to understand their cloud spend using the tools they already trust: PostgreSQL, TimescaleDB, APIs, Prometheus, and Grafana.

How the idea came into my mind

The idea started from a very practical DevOps problem.

When you work with cloud infrastructure every day, cost is not a separate finance-only topic. It is connected to architecture, deployments, storage, networking, monitoring, idle resources, and sometimes even small mistakes that silently keep generating bills.

I wanted something that could answer operational questions like:

  • Which region is contributing the highest cost?

  • Which compartment or linked account is responsible?

  • Which service is driving the increase?

  • Is today’s spend normal compared to the last few days?

  • Are there unused resources still generating cost?

  • Can this be shown cleanly in Grafana without directly exposing the database?

Cloud providers already have billing dashboards, but they are usually provider-specific. For real infrastructure work, I wanted something more open, extensible, and operational: ingest billing exports, normalize them, precompute useful summaries, expose APIs, and let Grafana visualize the result.

That became Torvix.

What Torvix is today

Torvix is currently built as an operational FinOps platform, not a long-term billing warehouse. It focuses on recent, actionable cost data and tries to make that data easier to inspect, alert on, and explain.

The core idea is simple:

Cloud billing exports
        ↓
Torvix ingestion
        ↓
PostgreSQL + TimescaleDB
        ↓
Precomputed summaries, anomalies, forecasts
        ↓
Torvix API + Grafana dashboards

Instead of making Grafana query the database directly in production, Torvix exposes API endpoints that dashboards can consume safely. This keeps PostgreSQL private and makes the dashboard layer cleaner.

Current features

1. AWS and OCI cost ingestion

Torvix currently supports AWS and OCI.

For AWS, the preferred architecture is CUR 2.0 / Data Export through S3. Cost Explorer is still available as an optional mode for testing, debugging, or fallback, but the main direction is file-based billing export ingestion.

For OCI, Torvix reads proprietary OCI cost reports from Object Storage, handles CSV/gzip reports, tolerates schema drift through dynamic headers, and keeps row-level lookback filtering as the correctness boundary.

This means Torvix can support both cloud-native export pipelines and a normalized internal cost model.

2. Canonical multi-cloud normalization

Different clouds have different naming models.

OCI has compartments. AWS has linked accounts. Future providers may have projects, subscriptions, resource groups, folders, or other hierarchy models.

Torvix normalizes cost records into provider-neutral fields like billing scope, project, network scope, resource metadata, tags, record type, and source metadata. This makes it possible to build common dashboard APIs without pretending that all clouds behave exactly the same.

3. Precomputed daily, weekly, and monthly summaries

After ingestion, Torvix refreshes dashboard summary tables instead of calculating everything from raw records on every request.

It precomputes:

  • daily cost summaries

  • weekly cost summaries

  • monthly cost summaries

  • dashboard breakdowns

  • anomaly rows

  • forecast rows

This design keeps dashboard APIs fast and predictable.

4. Grafana dashboards

Torvix ships Grafana dashboards for:

  • AWS FinOps views

  • OCI FinOps views

  • waste detection views

The AWS dashboard follows:

Region → Account / Scope → Service

The OCI dashboard follows:

Region → Compartment → Service

This was an intentional design decision. Each cloud has its own naming and hierarchy model, so forcing everything into one generic drilldown would hide useful context.

5. Explainable anomaly detection

Torvix anomaly detection is intentionally deterministic and explainable.

It does not blindly call something “AI detected” just because a number changed. It compares daily spend against recent historical baselines and keeps the logic debuggable.

The current model is useful for questions like:

  • Did this service suddenly increase?

  • Is this region above its normal baseline?

  • Did a linked account or compartment behave unusually?

  • Is the change large enough to care about?

The important part is that operators can understand why a finding exists.

6. Forecasting

Torvix also computes a simple forecast using recent spend trends. The purpose is not to pretend to be a full financial planning system. The purpose is to give operators a quick signal of where spend may go if the current pattern continues.

For many teams, even this simple forecast is useful because it brings cost awareness closer to daily operations.

7. OCI waste detection

Torvix includes Phase 1 waste detection for OCI.

Current OCI rules include:

  • detached block volumes

  • detached boot volumes

  • stopped compute instances with paid storage

  • unused reserved public IPs

This feature is recommendation-only. Torvix does not delete, stop, resize, retag, or modify any resource. It creates findings so humans can review and act.

That distinction is important. Cost optimization tooling should be safe by default.

8. Waste findings API

Waste detection is exposed through APIs for:

  • summary

  • findings

  • individual finding details

  • rules

  • status updates

Findings can be filtered by provider, region, scope, service, resource type, rule, severity, status, confidence, and estimated monthly waste.

This makes it easier to build dashboards, reports, or even custom workflows around waste review.

9. Optional AI explanations

Torvix has optional AI enrichment for anomaly and waste findings.

The important design principle is that AI does not create findings, change severity, resolve findings, or perform remediation. The deterministic engine creates the finding. AI can optionally explain the finding in a more readable way.

By default, sensitive identifiers such as account IDs, compartment IDs, resource IDs, and resource names can be excluded from model input. That keeps the feature practical while still being safer for real-world environments.

10. API authentication

Torvix supports bearer-token authentication for API endpoints.

When API auth is enabled, endpoints are protected except health checks and Swagger documentation. This matters because cost data can reveal sensitive infrastructure details.

11. Scheduler, reporting, and alerting

Torvix includes an in-process scheduler for ingestion and report delivery.

It can run scheduled ingestion, generate daily/weekly/monthly report windows, and send notifications through configured alerting targets. This makes it useful as a continuously running service instead of a one-time CLI script.

12. Production hardening

Before the first major release, I focused heavily on hardening:

  • CI workflow for tests and builds

  • Compose validation

  • Swagger validation

  • vulnerability scanning with govulncheck

  • non-root container runtime

  • Docker image hardening

  • JSON subsystem logging

  • optional stdout log mirroring

  • health checks

  • version exposure through /healthz

  • Apache-2.0 licensing metadata

This was important because a FinOps platform should not just work locally. It should be safe enough to run near production billing data.

The first major release

The first major release of Torvix is v1.0.0.

For me, this release is not about saying “everything is finished.” It is about saying the foundation is now clear enough to be used, tested, improved, and extended.

The v1.0.0 release includes the major building blocks:

  • AWS CUR/Data Export ingestion

  • optional AWS Cost Explorer mode

  • OCI billing export ingestion

  • PostgreSQL + TimescaleDB storage

  • precomputed cost summaries

  • anomaly detection

  • forecasting

  • Grafana dashboards

  • OCI waste detection

  • API endpoints

  • bearer auth

  • scheduler and reporting

  • optional AI explanations

  • production-oriented hardening

The most important part of this release is the architecture. Torvix now has a provider-aware but provider-neutral foundation. That means future providers and deeper resource attribution can be added without rewriting the entire platform.

What I learned while building it

Building Torvix made one thing very clear: cloud cost intelligence is not just about parsing billing files.

The hard parts are in the details:

  • avoiding duplicate billing records

  • handling delayed reports

  • dealing with provider-specific schemas

  • keeping dashboards fast

  • avoiding double-counting

  • deciding what should be precomputed

  • making anomaly detection explainable

  • keeping credentials and APIs secure

  • designing drilldowns that match each cloud provider’s reality

  • making waste detection useful but non-destructive

A cost dashboard is easy to demo. A reliable cost intelligence pipeline is harder.

Future plans and enhancements

Torvix is still evolving. Some of the major roadmap items I want to work on next are:

1. AWS waste detection

OCI waste detection is already implemented in Phase 1. AWS waste detection is the next major step.

Planned AWS rules include:

  • stopped or idle EC2 instances

  • unattached EBS volumes

  • unused Elastic IPs

  • idle load balancers

  • idle RDS databases

  • CloudWatch utilization based checks

  • tag-based exclusions

  • multi-account and multi-region scanning

2. Deeper AWS attribution

AWS cost attribution can get complicated because not every service maps neatly to a VPC or resource.

Future work can improve attribution using:

  • tags

  • cost categories

  • account mappings

  • project mappings

  • inventory enrichment

  • VPC-to-project mapping where applicable

The goal is to move beyond account-level visibility and get closer to resource-level accountability.

3. Parquet support for AWS exports

Current AWS export support focuses on CSV and CSV gzip. Parquet support is planned because many billing export workflows prefer it for efficiency and scale.

4. Azure and GCP support

Torvix currently exposes implemented providers only: AWS and OCI.

Azure and GCP support are natural future additions, but I want to add them properly instead of leaving half-built provider scaffolding. Each provider has its own billing model, hierarchy, and terminology, so the implementation should respect that.

5. Better anomaly tuning

The current anomaly model is deterministic and explainable. Future improvements can include:

  • configurable thresholds

  • per-provider tuning

  • per-service sensitivity

  • better false-positive controls

  • richer anomaly explanations

  • seasonality-aware baselines

The priority will remain explainability.

6. Better reporting workflows

Daily, weekly, and monthly reporting can become more useful with:

  • richer report templates

  • provider-specific report sections

  • top movers

  • top waste findings

  • anomaly summary

  • forecast summary

  • Slack/Teams/Discord/email friendly formats

7. More dashboards and drilldowns

Grafana is already a strong visualization layer for Torvix. Future dashboards can add:

  • cost movement views

  • service-level trend analysis

  • resource-level cost views

  • waste trend over time

  • anomaly history

  • provider comparison

  • forecast vs actual views

Why open source?

I wanted Torvix to be open source because cloud cost visibility should not be locked behind expensive tooling only.

Many engineers, small teams, startups, and self-hosted infrastructure users need practical FinOps visibility, but they may not need or afford a large enterprise platform. An open-source foundation allows people to inspect how the system works, self-host it, modify it, and contribute provider-specific improvements.

Also, for infrastructure tooling, transparency matters. If a tool is analyzing billing data and recommending cost actions, users should be able to understand the logic.

Final thoughts

Torvix started as a practical DevOps pain point: cloud cost data exists, but it is often not operational enough.

I wanted a system that could ingest billing exports, normalize cost records, detect anomalies, forecast spend, identify waste, and expose everything through APIs and Grafana dashboards.

The first major release is just the beginning.

There is still a lot to improve, especially around AWS waste detection, resource-level attribution, Azure/GCP support, richer reporting, and better anomaly tuning. But the foundation is now in place.

Torvix is my attempt to build a simple, open, self-hosted cloud cost intelligence platform for engineers who want to understand not just how much they spent, but why they spent it.