Student at MIT currently exploring embodied intelligence and physical AI


About Me

Hey, I’m Rishi Shiv, a current sophomore at MIT studying computer science with hopefully a math double major. Spent most of my childhood as a competitive fencer which shaped a lot of who I am today, but since coming to MIT I’ve started to focus on exploring my academic/research interests. Outside of all that, I love watching soccer and reading sci-fi. Website is a wip.


My Work

Up until the 2025-2026 academic year my work was all over the place, mostly exploring different interests to see what stuck. Robotics ended up catching my eye. More specifically interested in embodied AI and multisensory perception - enabling agents to perceive, interact, and evolve in the physical world just like we do. Also interested in data collection/creation. Past projects explored signal processing, agent orchestration, and more. Currently spending most of my time on multimodal robotics research with the Multisensory Intelligence and CDFG groups. Some of my work is below, much more to come!


Multimodal policy for tactile aware dextrous manipulation in sim

Goal: Demonstrate that the tactile modality can assist in improved zero-shot sim to real transfer for dextrous manipulation tasks. Ideally will also use the Neural Tactile Sensor below.

Making an end-to-end pipeline, including both data collection and training, for tactile aware dextrous manipulation.

  • Constructed the simulation below

  • Developed a complete teleoperation pipeline using ros1 noetic(a bit dated but the lab’s robot is already set up using it) for franka arm + xhand both in simulation and real world.

    • The teleop uses manus gloves for hand landmark pose targets, and a rigged vive ultimate tracker for the arm pose target. Implemented inverse kinematics to retarget the robot joints based on target’s streamed from manus glove and vive tracker.

  • Using a grasp generation policy to generate large amounts of synthetic examples of the hand picking up objects.

  • Pretraining a multimodal(vision, pose, tactile) on the synthetically generated data and finetuning with teleop data collected doing tool manipulation.


Neural Tactile Sensor

Goal: Enable tactile sensors in simulation to closely mimic the Non-Idealities of real world tactile sensors so that policies trained in simulation are more accurate in the real world.

Making a high-fidelity simulation using MuJoCo MJX that leverages JAX-based differentiable physics for gradient-based parameter optimization. Testing with a variety of different tactile sensors/plugins available in MuJoCo.

  • implemented spatial projection of 2D taxel grids onto complex robot geometries(ex. dextrous hands) using ray-casting

  • Designed a Residual Neural Network to bridge up to a ~30% gap between simulation and real world tactile sensors. It adjusts tactile data collected from simulation to more accurately match a corresponding real-world tactile sensor reading.

    • Experimenting with MLP, Transformer, GAN, and other architectures

  • Setting up the real-world and simulation teleop’s to be synchronized for paired data collection of both hands(real and sim) doing the same action.


Deep Residual Embeddings for Emotion-Based Art Recommendation

Initially wanted a art-music recommendation system that could recommend paintings based on a song I liked. Tried training an autoencoder to find a shared latent space between audio and image embeddings using emotion labels as a bridge, but kept running into mode collapse. My guess is problem was a combination of dataset size disparity(audio dataset was much smaller), and the original embeddings(used CLIP and CLAP encoders) being too dissimilar. Pivoted to start with an art recommendation engine, planning to revisit audio(maybe eventually a multi-media recommendation engine would be nice).

  • Used a ResNet-50 backbone to extract high-dimensional latent representations of paintings.

  • Processed and made DataLoaders for the ArtEmis dataset. Some artworks were annotated with different emotions, so filtered based on consensus and quantified label uncertainty in order to have high quality supervision for emotion classification.

    • the cnn could only reach around 60% accuracy for emotion classification, but as humans aren’t very consistent with our emotions and even the artists labeling the works with emotions only achieved consensus ~60% of the time this seems reasonable.

  • Also made a recommendation layer that uses the extracted embeddings. Made the retrieval algorithm take the emotion of the art, embedding cosine similarity, and diversity factors into account

To an extent feels like artworks get recommended based on similarity and not emotion, but then again similar artworks would evoke similar emotions.


CrewOptimizer: Multi-Agent Meta-Prompting for SLM Orchestration

Made an agentic meta-prompting framework that enhanced the performance of Small Language Models(SLMs) like Llama 3.1-8b and Qwen 2.5-0.5b in specialized domains.

  • Designed an optimization pipeline where specialized agents iteratively refine discrete parts of a system prompt. For example the task optimizer agent focusses on improving the clarity of the prompt instructions, the example optimizer agent generates sets of example input-output pairs for the prompt, and the guidance optimizer agent generates detailed reasoning paths for the prompt.

  • Using this pipeline, optimized meta-prompts were able to achieve absolute performance improvement of 69.6% for Llama 3.1-8b on a financial agent-routing dataset. Optimized the pipeline to run in 2 minutes for less than 20 cents.


Optical Flow-Based Slippage Detection for Autonomous Underwater Vehicles

Early Prototype with an led.

The MIT Autonomous Underwater Vehicles lab had a cable mounted robot that was being used for coastal water surveying. The robot often stalled underwater which messed with localization on the cable. Previous versions tried to measure current for stalling detection, but when underwater and the clamps are physically slipping on the cable this didn’t work. So I used an optical flow sensor instead

  • integrated high precision optical flow sensors(PAA5160E1 and PMW3901) into the robot design. Made firmware utilizing I2C and SPI protocols to process pixel-shift data in order to track X/Y displacement at a high frequency. Compared real-time displacement deltas to calibrated baselines to detect slippage.

  • Also made a mounting for the sensors with AutoCAD.

first time working with hardware sensor integration and CAD but was fun.


Evolutionary Optimization for Wavelet-Based EEG Denoising

Me Learning about Exploration vs. Exploitation in real time

Was going down an EEG/BCI rabbit hole and evolutionary algorithms were interesting, so tried to combine the two. Wavelet transform can be used for denoising EEG signals, but input parameter choices(ex. mother wavelet, thresholding mode, threshold selection rule, etc.) are often made via visual analysis of the signal. I’m no expert at visually analyzing EEG signals, so I tried to use a Genetic Algorithm(GA) to find good input parameters.

  • Coded the genetic algorithm from scratch with a custom evolutionary loop(Single-Point Crossover, Random Resetting Mutation, and Elitism) to preserve alleles with high fitness scores across generations. Compared roulette wheel(provided higher selection pressure) and tournament selection(avoided premature convergence better) methods.

  • The GA aimed to optimize a set of inputs to a wavelet transform function for optimal denoising. Used a fitness function that balanced normalized mean squared error(NMSE), for preserving signal integrity, and decomposition level, for enforcing effective denoising.

Played around with different parts of the GA to understood how it works, fun project that got me interested in EEG signal analysis.


Ultrasound Wave Propagation & Tissue Phantom Modeling in Sim

Developed a simulation environment that used the k-wave MATLAB toolbox to model ultrasound wave propagation through the body. Ultimately was to be used to make synthetic dataset for training a model to measure blood pressure with ultrasound.

  • Made digital tissue phantoms for the carotid artery with varying acoustic impedances and attenuation coefficients to accurately represent the human body. Used Time-Reversal algorithms to reconstruct acoustic sources in scattering environments, effectively reproducing ultrasound in a MATLAB simulation.


A bit of App Dev in Java using Swing. Family friend was pre-diabetic so I made a personalized app to help them track their diet and other health factors. Did end up reducing their A1C levels!

Personalized Java-Based Health Tracker