Using markerless motion capture for music generation, music creativity

Project title/description

Using markerless motion capture for music generation, music creativity and music-dance interactions

More detailed description of the project

This is a new exploratory project that aims to use the recent advances in real-time markerless motion/pose capture software (SAT's Livepose features pose detection backends implemented with Google MediaPipe and OpenMMLab MMPose and NVIDIA trt_pose with TensorRT, and potentially Jarvis / DeepLabCut) to create a working framework whereby movements can be translated into sound with short-latency, allowing for example, gesture-driven new musical instruments and dancers who control their own music while dancing. The project will require familiarity with Python, and ability to interface with external packages like MediaPipe and Jarvis. Familiarity with low-latency sound generation, image processing, and audiovisual displays is an advantage, though not necessary. The development of such tools will facilitate both artistic creation, as well as scientific exploration of multiple areas, including for example - how people engage interactively with vision, sound, and movement and combine their respective latent creative spaces. Such a tool will also have therapeutic/rehabilitative applications in populations of people with limited ability to generate music and in whom agency and creativity in producing music have been shown to produce beneficial effects.

Expected outcomes

Evaluate which pose backend implemented or to be implemented in Livepose (among Google MediaPipe and OpenMMLab MMPose and NVIDIA trt_pose with TensorRT, and towards potentially Jarvis / DeepLabCut) is best suited for low-latency audiovisual generation
Implement or optimize the chosen pose backend (Python)
Implement an audiovisual generation engine (Python or C++)
Implement a LivePose output filter: either embedding directly audiovisual generation (Python) or communicating with an external engine via inter-operatibility protocols (libmapper, OSC, websocket)

Skills required/preferred

required: comfortable with Python
required: experience with image/video processing and using deep-learning based image-processing models
preferred: familiarity with C/C++ programming, low-latency sound generation, image processing, and audiovisual displays, as well as MediaPipe or other markerless pose/motion capture tools is an advantage, though not necessary

Possible mentors

Suresh Krishna (McGill m2b3 Lab), Christian Frisson (LivePose), Michał Seta (low-latency sound synthesis)

Expected size of project

350 hours

Rating of difficulty

hard