Skip to content

RW2523/personaplex

 
 

Repository files navigation

PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models

Weights Paper Demo Demo Discord

PersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning. Trained on a combination of synthetic and real conversations, it produces natural, low-latency spoken interactions with a consistent persona. PersonaPlex is based on the Moshi architecture and weights.

PersonaPlex Model Architecture
PersonaPlex Architecture

What This Fork Adds

FP8 weight quantization and inference optimizations for Blackwell GPUs (DGX Spark, Jetson Thor). Sub-80ms frame latency, entirely on-device. See README-optimizations.md for the full performance breakdown.


Running on DGX Spark

Prerequisites

Setup

Note: If conda is active, deactivate it first (conda deactivate) so the venv is created from the system Python.

python3 -m venv ~/personaplex-venv
source ~/personaplex-venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
git clone https://github.com/amarrmb/personaplex.git && cd personaplex
pip install -e moshi/.
export HF_TOKEN=<YOUR_TOKEN>

Run

source ~/personaplex-venv/bin/activate
SSL_DIR=$(mktemp -d)
python -m moshi.server --fp8 --ssl "$SSL_DIR"

Open https://<DGX_IP>:8998 in your browser. Expected frame time: ~74ms.


Running on Jetson Thor

Prerequisites

  • Jetson Thor with JetPack 7.1 (L4T R38.4.0)
  • Python 3.12
  • libopus (sudo apt install libopus-dev)
  • Rust toolchain (for building sphn)
  • sudo access (first run only, for power mode)
  • HuggingFace token with model license accepted

Setup

Important: If conda or miniforge is active, deactivate it first (conda deactivate). The venv must be created from the system Python, not conda's Python, or you'll get _ctypes ABI errors at import time.

/usr/bin/python3.12 -m venv ~/personaplex-venv
source ~/personaplex-venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
git clone https://github.com/amarrmb/personaplex.git && cd personaplex
pip install -e moshi/.
export HF_TOKEN=<YOUR_TOKEN>

Fix Triton ptxas for Jetson Thor

The Triton package bundles a ptxas-blackwell binary that doesn't recognize Jetson Thor's sm_110a GPU architecture. Replace it with the system ptxas from JetPack:

TRITON_BIN=~/personaplex-venv/lib/python3.12/site-packages/triton/backends/nvidia/bin
mv "$TRITON_BIN/ptxas-blackwell" "$TRITON_BIN/ptxas-blackwell.orig"
ln -s /usr/local/cuda/bin/ptxas "$TRITON_BIN/ptxas-blackwell"

This is not needed on DGX Spark (sm_121 is supported by the bundled binary).

Run

# First run only — set max power mode
sudo nvpmodel -m 0 && sudo jetson_clocks

source ~/personaplex-venv/bin/activate
SSL_DIR=$(mktemp -d)
taskset -c 4-13 python -m moshi.server --fp8 --ssl "$SSL_DIR"

Open https://<JETSON_IP>:8998 in your browser. Expected frame time: ~78ms.


libopus from source

If you don't have sudo (e.g. shared DGX):

git clone https://github.com/xiph/opus.git && cd opus
./autogen.sh && ./configure --prefix=$HOME/.local && make -j$(nproc) && make install
export PKG_CONFIG_PATH=$HOME/.local/lib/pkgconfig:$PKG_CONFIG_PATH
export LD_LIBRARY_PATH=$HOME/.local/lib:$LD_LIBRARY_PATH

Usage

Prerequisites

Install the Opus audio codec development library:

# Ubuntu/Debian
sudo apt install libopus-dev

# Fedora/RHEL
sudo dnf install opus-devel

# macOS
brew install opus

Installation

Download this repository and install with:

pip install moshi/.

Extra step for Blackwell based GPUs as suggested in (See NVIDIA#2):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Accept Model License

Log in to your Huggingface account and accept the PersonaPlex model license here.
Then set up your Huggingface authentication:

export HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN>

Launch Server

Launch server for live interaction (temporary SSL certs for https):

SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR"

FP8 (Blackwell GPUs): Add --fp8 for ~1.4x faster inference:

SSL_DIR=$(mktemp -d); python -m moshi.server --fp8 --ssl "$SSL_DIR"

CPU Offload: If your GPU has insufficient memory, use the --cpu-offload flag to offload model layers to CPU. This requires the accelerate package (pip install accelerate):

SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR" --cpu-offload

Access the Web UI from a browser at localhost:8998 if running locally, otherwise look for the access link printed by the script:

Access the Web UI directly at https://11.54.401.33:8998

Offline Evaluation

For offline evaluation use the offline script that streams in an input wav file and produces an output wav file from the captured output stream. The output file will be the same duration as the input file.

Add --cpu-offload to any command below if your GPU has insufficient memory (requires accelerate package). Or install cpu-only PyTorch for offline evaluation on pure CPU.

Assistant example:

HF_TOKEN=<TOKEN> \
python -m moshi.offline \
  --voice-prompt "NATF2.pt" \
  --input-wav "assets/test/input_assistant.wav" \
  --seed 42424242 \
  --output-wav "output.wav" \
  --output-text "output.json"

Service example:

HF_TOKEN=<TOKEN> \
python -m moshi.offline \
  --voice-prompt "NATM1.pt" \
  --text-prompt "$(cat assets/test/prompt_service.txt)" \
  --input-wav "assets/test/input_service.wav" \
  --seed 42424242 \
  --output-wav "output.wav" \
  --output-text "output.json"

Voices

PersonaPlex supports a wide range of voices; we pre-package embeddings for voices that sound more natural and conversational (NAT) and others that are more varied (VAR). The fixed set of voices are labeled:

Natural(female): NATF0, NATF1, NATF2, NATF3
Natural(male):   NATM0, NATM1, NATM2, NATM3
Variety(female): VARF0, VARF1, VARF2, VARF3, VARF4
Variety(male):   VARM0, VARM1, VARM2, VARM3, VARM4

Prompting Guide

The model is trained on synthetic conversations for a fixed assistant role and varying customer service roles.

Assistant Role

The assistant role has the prompt:

You are a wise and friendly teacher. Answer questions or provide advice in a clear and engaging way.

Use this prompt for the QA assistant focused "User Interruption" evaluation category in FullDuplexBench.

Customer Service Roles

The customer service roles support a variety of prompts. Here are some examples for prompting style reference:

You work for CitySan Services which is a waste management and your name is Ayelen Lucero. Information: Verify customer name Omar Torres. Current schedule: every other week. Upcoming pickup: April 12th. Compost bin service available for $8/month add-on.
You work for Jerusalem Shakshuka which is a restaurant and your name is Owen Foster. Information: There are two shakshuka options: Classic (poached eggs, $9.50) and Spicy (scrambled eggs with jalapenos, $10.25). Sides include warm pita ($2.50) and Israeli salad ($3). No combo offers. Available for drive-through until 9 PM.
You work for AeroRentals Pro which is a drone rental company and your name is Tomaz Novak. Information: AeroRentals Pro has the following availability: PhoenixDrone X ($65/4 hours, $110/8 hours), and the premium SpectraDrone 9 ($95/4 hours, $160/8 hours). Deposit required: $150 for standard models, $300 for premium.

Casual Conversations

The model is also trained on real conversations from the Fisher English Corpus with LLM-labeled prompts for open-ended conversations. Here are some example prompts for casual conversations:

You enjoy having a good conversation.
You enjoy having a good conversation. Have a casual discussion about eating at home versus dining out.
You enjoy having a good conversation. Have an empathetic discussion about the meaning of family amid uncertainty.
You enjoy having a good conversation. Have a reflective conversation about career changes and feeling of home. You have lived in California for 21 years and consider San Francisco your home. You work as a teacher and have traveled a lot. You dislike meetings.
You enjoy having a good conversation. Have a casual conversation about favorite foods and cooking experiences. You are David Green, a former baker now living in Boston. You enjoy cooking diverse international dishes and appreciate many ethnic restaurants.

Use the prompt You enjoy having a good conversation. for the "Pause Handling", "Backchannel" and "Smooth Turn Taking" evaluation categories of FullDuplexBench.

Generalization

Personaplex finetunes Moshi and benefits from the generalization capabilities of the underlying Helium LLM. Thanks to the broad training corpus of the backbone, we find that the model will respond plausibly to out-of-distribution prompts and lead to unexpected or fun conversations. We encourage experimentation with different prompts to test the model's emergent ability to handle scenarios outside its training distribution. As an inspiration we feature the following astronaut prompt in the WebUI:

You enjoy having a good conversation. Have a technical discussion about fixing a reactor core on a spaceship to Mars. You are an astronaut on a Mars mission. Your name is Alex. You are already dealing with a reactor core meltdown on a Mars mission. Several ship systems are failing, and continued instability will lead to catastrophic failure. You explain what is happening and you urgently ask for help thinking through how to stabilize the reactor.

License

The present code is provided under the MIT license. The weights for the models are released under the NVIDIA Open Model license.

Citation

If you use PersonaPlex in your research, please cite our paper:

@misc{roy2026personaplexvoicerolecontrol,
      title={PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models},
      author={Rajarshi Roy and Jonathan Raiman and Sang-gil Lee and Teodor-Dumitru Ene and Robert Kirby and Sungwon Kim and Jaehyeon Kim and Bryan Catanzaro},
      year={2026},
      eprint={2602.06053},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06053},
}

cd /home/echomind/Documents/plex/personaplex/client && npm run build 2>&1 | tail -20

pkill -f "moshi.server"

pkill -f "vite"

lsof -ti:8998 | xargs kill -9

cd /home/echomind/Documents/plex/personaplex source ~/personaplex-venv/bin/activate # if using venv SSL_DIR=$(mktemp -d) python -m moshi.server --fp8 --ssl "$SSL_DIR" --static client/dist

About

PersonaPlex code optimized for Jetson Thor & DGX Spark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 54.6%
  • TypeScript 44.6%
  • Other 0.8%