The Problem: Robotic Manipulation Beyond Scripts
Robotic pick-and-place tasks appear deceptively simple, yet they expose core challenges in manipulation: high-dimensional control, nonlinear kinematics, perception uncertainty, and the need for precise, repeatable interaction.
Classical pipelines can solve structured tasks reliably, but often lack robustness to variation. This project explores a progression from explicit, model-based control to learning-based manipulation, and ultimately toward hybrid systems.
Step 1: Classical Pick & Place Pipeline
The classical pipeline decomposes manipulation into perception, planning, and control, each solved explicitly using analytical or sampling-based methods.
- HSV-based cube detection and centroid extraction
- Coordinate transformation using ROS 2 TF
- Sampling-based motion planning with OMPL (RRTConnect)
- Jacobian-based inverse kinematics using Pinocchio
- Explicit state-machine sequencing
Camera → HSV Segmentation
↓
2D Centroid
↓
TF Transform
↓
RRTConnect Planner
↓
Jacobian IK Control
↓
Joint Commands
Perception: HSV Filtering and Object Localization
To maintain transparency and debuggability, object detection is performed using classical color segmentation in HSV space.
RGB images are converted into HSV and thresholded:
$$ M(x,y) = \begin{cases} 1 & H_{\min} \le H(x,y) \le H_{\max} \\ 0 & \text{otherwise} \end{cases} $$
Contours are extracted from the binary mask, and the dominant contour is selected. The centroid is computed using image moments:
$$ c_x = \frac{m_{10}}{m_{00}}, \quad c_y = \frac{m_{01}}{m_{00}} $$
The centroid is projected into 3D using known camera intrinsics and transformed into the robot base frame using TF.
Inverse Kinematics with Pinocchio
Forward kinematics define a nonlinear mapping:
$$ \mathbf{x} = f(\mathbf{q}) $$
The differential relationship between joint space and task space is:
$$ \dot{\mathbf{x}} = \mathbf{J}(\mathbf{q}) \dot{\mathbf{q}} $$
Given a desired end-effector pose \( \mathbf{x}^* \), a task-space error is computed and resolved via damped least-squares inverse:
$$ \dot{\mathbf{q}} = \mathbf{J}^\top (\mathbf{J}\mathbf{J}^\top + \lambda^2 \mathbf{I})^{-1} (\mathbf{x}^* - \mathbf{x}) $$
Pinocchio provides analytical Jacobians and efficient frame-level kinematics, enabling stable real-time inverse kinematics control.
Motion Planning with OMPL (RRTConnect)
Motion planning is solved in joint configuration space \( \mathcal{C} \subset \mathbb{R}^7 \).
RRTConnect grows two rapidly-exploring random trees from start and goal configurations until a collision-free connection is found:
$$ \|\mathbf{q}_{start} - \mathbf{q}_{goal}\| < \varepsilon $$
The resulting path is time-parameterized and executed through the IK controller.
Step 2: PPO-Based Manipulation in MuJoCo
To introduce adaptability, a reinforcement learning environment was built in MuJoCo. The policy operates on incremental Cartesian actions rather than explicit joint targets.
The objective maximizes expected return:
$$ J(\theta) = \mathbb{E}_{\pi_\theta} \left[ \sum_{t=0}^{\infty} \gamma^t r_t \right] $$
The reward function combines distance, success, and smoothness terms:
$$ r_t = -\|\mathbf{p}_{ee} - \mathbf{p}_{obj}\| + r_{\text{success}} - \alpha \|\Delta \mathbf{a}_t\|^2 $$
PPO uses clipped policy updates with GAE to ensure stable training.
Next Step: Integrating IK with Learning
The learning-based system intentionally excludes explicit IK to isolate policy behavior. Future work will integrate inverse kinematics constraints directly into the learning pipeline.
- Hybrid PPO + IK control
- Improved stability near kinematic limits
- More reliable grasp execution