Company surface

Reproducible AI infrastructure for agents and models.

Tensor Cortex builds deterministic evaluation, sandboxed execution, training infrastructure, and private coding-agent release gates for teams that need evidence before they trust an AI system.

Public infrastructure, EvalOps tooling, run reports
Private model weights, scaling strategy, data mixture

Private release gate

Agent evaluation run

deterministic
01 Repo tasks private suite
02 Sandbox network off
03 Verifier hidden tests
04 Report content hash
45 seed coding tasks
1,100+ CPU tests green
0 cluster runs claimed
I

Infrastructure

A single-code-path stack for model definition, training, evals, decontamination, conversion, lineage, and reproducible run artifacts.

E

Private EvalOps

Private coding-agent evaluation on a customer's own repo: hidden tests, sandboxed execution, repeatable scoring, and regression gates.

R

Run Reports

Provider-friendly evidence packages with config, git SHA, eval hashes, goodput, MFU, checkpoint behavior, and failure notes.

Infrastructure program

One measurement system, from laptop smoke to cluster run.

The public surface is deliberately narrow: reproducible infrastructure, private EvalOps tooling, and evidence reports. The private research set, model details, and data-mixture decisions stay private.

Deterministic evalscontent-addressed runs and repeatable task specs
Sandboxed executionbounded code runs for evals and coding agents
Train/serve parityconversion tests before model-code changes ship
Gate-driven computelarge runs only when they produce evidence
First product

Private SWE-bench for your repo.

Most teams choose coding agents using public benchmarks, demos, or vibes. Private EvalOps answers the sharper question: which agent setup actually works on your codebase?

Build private tasks

Bug fixes, multi-file changes, refactors, and feature tasks derived from real repo work.

Run agents fairly

The same tasks across Codex, Claude Code, Cursor-style agents, open-weight models, or internal loops.

Grade objectively

Hidden tests and verifiers decide pass/fail. The agent's self-report is never the score.

Catch regressions

Re-run the suite whenever prompts, tools, models, or guardrails change.

Pilot package

Small enough to start, concrete enough to matter.

10-30private repo tasks
$1k-$3kfirst pilot
$5k-$15kdeeper custom suite
$2k-$10k/momonthly regression gate
Compute partners

Bounded compute asks with public evidence in return.

Tensor Cortex is looking for practical compute partnerships: start with a GPU smoke, then a bounded bootstrap run, and publish the evidence package instead of making unsupported claims.

Ask 150 H100 GPUhstack smoke and environment report
Ask 2500 H100 GPUhG0 bootstrap evidence package
Ask 33k+ GPUhscaling-law pilot after proof