I built an end-to-end VLA model from the ground up and deployed it on a real 7-DoF arm. No pretrained backbone shortcuts. It runs at 50 Hz and hits over 90% success on a physical pick and place task.
PRANA stands for Perception-conditioned Robotic Network with Attention. I designed and trained every component from scratch — the vision encoders, the language embedding, the transformer architecture, and the action chunking mechanism.
The model fuses two camera streams, a language instruction, and the robot's joint state into a single sequence. A transformer processes that sequence and outputs 50 future actions in one forward pass. That chunk gets executed at 50 Hz on the physical arm.
I used LeRobot as the deployment harness and collected over 500 teleoperation episodes myself. The task is screwdriver retrieval in an unstructured environment — small object, tight tolerances, real consequences for bad predictions.
The core idea is straightforward. I encode vision, language, and robot state into tokens of the same dimension, concatenate them into one sequence, and let a transformer figure out what the arm should do next. 50 learnable action queries read from that context and output a full action horizon in a single shot.
This means no autoregressive decoding, no waiting. The whole chunk comes out at once and the arm executes it. That's how I get 50 Hz on real hardware without a GPU strapped to the robot.
I collected every single training episode myself through teleoperation using the LeRobot recording stack. Two synchronized camera streams, proprioceptive joint state, and a fixed language instruction per episode.
The key training design decision was action chunking. Instead of predicting one action per step, the model predicts 50 steps at once. This collapses compounding error and lets the robot commit to smooth trajectories rather than reacting jerkily every timestep.
Getting a trained policy onto real hardware is a different problem from training it. Here is exactly how PRANA goes from a checkpoint to motor commands on the physical arm.