Franka Panda Manipulation

Classical Control . Reinforcement Learning . MuJoCo

The Problem: Robotic Manipulation Beyond Scripts

Robotic pick-and-place tasks appear deceptively simple, yet they expose core challenges in manipulation: high-dimensional control, nonlinear kinematics, perception uncertainty, and the need for precise, repeatable interaction.

Classical pipelines can solve structured tasks reliably, but often lack robustness to variation. This project explores a progression from explicit, model-based control to learning-based manipulation, and ultimately toward hybrid systems.

Step 1: Classical Pick & Place Pipeline

The classical pipeline decomposes manipulation into perception, planning, and control, each solved explicitly using analytical or sampling-based methods.

  • HSV-based cube detection and centroid extraction
  • Coordinate transformation using ROS 2 TF
  • Sampling-based motion planning with OMPL (RRTConnect)
  • Jacobian-based inverse kinematics using Pinocchio
  • Explicit state-machine sequencing
Camera → HSV Segmentation
        ↓
   2D Centroid
        ↓
     TF Transform
        ↓
  RRTConnect Planner
        ↓
  Jacobian IK Control
        ↓
   Joint Commands
    

Perception: HSV Filtering and Object Localization

To maintain transparency and debuggability, object detection is performed using classical color segmentation in HSV space.

RGB images are converted into HSV and thresholded:

$$ M(x,y) = \begin{cases} 1 & H_{\min} \le H(x,y) \le H_{\max} \\ 0 & \text{otherwise} \end{cases} $$

Contours are extracted from the binary mask, and the dominant contour is selected. The centroid is computed using image moments:

$$ c_x = \frac{m_{10}}{m_{00}}, \quad c_y = \frac{m_{01}}{m_{00}} $$

The centroid is projected into 3D using known camera intrinsics and transformed into the robot base frame using TF.

Inverse Kinematics with Pinocchio

Forward kinematics define a nonlinear mapping:

$$ \mathbf{x} = f(\mathbf{q}) $$

The differential relationship between joint space and task space is:

$$ \dot{\mathbf{x}} = \mathbf{J}(\mathbf{q}) \dot{\mathbf{q}} $$

Given a desired end-effector pose \( \mathbf{x}^* \), a task-space error is computed and resolved via damped least-squares inverse:

$$ \dot{\mathbf{q}} = \mathbf{J}^\top (\mathbf{J}\mathbf{J}^\top + \lambda^2 \mathbf{I})^{-1} (\mathbf{x}^* - \mathbf{x}) $$

Pinocchio provides analytical Jacobians and efficient frame-level kinematics, enabling stable real-time inverse kinematics control.

Motion Planning with OMPL (RRTConnect)

Motion planning is solved in joint configuration space \( \mathcal{C} \subset \mathbb{R}^7 \).

RRTConnect grows two rapidly-exploring random trees from start and goal configurations until a collision-free connection is found:

$$ \|\mathbf{q}_{start} - \mathbf{q}_{goal}\| < \varepsilon $$

The resulting path is time-parameterized and executed through the IK controller.

Step 2: PPO-Based Manipulation in MuJoCo

To introduce adaptability, a reinforcement learning environment was built in MuJoCo. The policy operates on incremental Cartesian actions rather than explicit joint targets.

The objective maximizes expected return:

$$ J(\theta) = \mathbb{E}_{\pi_\theta} \left[ \sum_{t=0}^{\infty} \gamma^t r_t \right] $$

The reward function combines distance, success, and smoothness terms:

$$ r_t = -\|\mathbf{p}_{ee} - \mathbf{p}_{obj}\| + r_{\text{success}} - \alpha \|\Delta \mathbf{a}_t\|^2 $$

PPO uses clipped policy updates with GAE to ensure stable training.

Next Step: Integrating IK with Learning

The learning-based system intentionally excludes explicit IK to isolate policy behavior. Future work will integrate inverse kinematics constraints directly into the learning pipeline.