Voice-change, Lip-sync, Text-to-speech, Music/Audio tools for projects

mindsongmindsong Posts: 1,701
edited June 2019 in Carrara Discussion

A place to collect thoughts, tips, and tools on voice-changing, lip-sync, text-to-speech, and ausio processing for our projects (esp. animations).

Add your own tips, links, resources, etc. below, or send them to me and we'll maintain a knowledgebase here...

Because of the overlap between tools and workflows, I'd say that anything you know of in this domain is useful, be it Carrara-specific, or using any one of the standard Carrara plugins (Poser, DS, Iclone, etc. :)

Standalone utils and scripts are always relevant as well. Anything related to making a 3D mouth move, or generating the sound that goes with it is welcomed.


Post edited by mindsong on


  • mindsongmindsong Posts: 1,701
    edited June 2019

    Voice Changing tools:

    These are standalone tools that work in batch or realtime to alter incoming sound streams (usually voice) from one frequency range and/or timbre to another - e.g. male to female, or child to adult, etc.

    Most also have settings that can be used to produce cartoony or robotic voices as well. To my experience, most of the outputs end up sounding a bit synthetic, but if you are willing to 'bend' your own voice on the microphone toward the target sound (e.g male trying to sound more female), and use the various application's sound adjustments with restraint, some pretty compelling outputs can be produced, and presets can be saved for these settings. The results have no copyright constraints (assuming the inputs aren't copyrighted...). Once a preset is saved, some of these tools allow for batch conversion, allowing for consistent pre-recording and conversion of full animation voice sequences, using multiple characters.

    The inputs and outputs can generally be standard soundfiles (WAVE, MP3, AAC, etc.), that can come from microphones, sound-files, audio-streams, etc., then be processed (in many ways) and used with lip-sync tools for our 3D efforts, and also inserted into the final video-edits.

    Any products/tools mentioned below are NOT endorsed, but simply available, and I have no affiliation with any of these products or companies other than possibly being an owner/user. YMMV

    Commercial Voice Changing Software

    Product: Screaming Bee's Morphvox Series

    Source: https://screamingbee.com

    Platform(s): Windows

    Notes: Free and Payfor versions for both realtime and batch voice conversion. Some good multi-voice and script-writing utilities as well. Presets available for realistic and cartoon/fantasy voices.

    Product: Audio4Fun Voice Changer Series

    Source: https://www.audio4fun.com/voice-changer.htm

    Platform(s): Windows

    Notes: Various versions for realtime and batch conversion. Presets available for realistic and cartoon/fantasy voices.


    Post edited by mindsong on
  • mindsongmindsong Posts: 1,701
    edited June 2019

    text-to-speech (TTS) and speech-to-text:

    These are tools that attempt to convert text to audio, and audio to text.

    In our 3D domain (esp. animation), text to speech is probably the most relevant, as sound would typically be the most useful end-product. That said, any tool that lets us rework our data in ways that let's creative folks work toward their/our target goals will enable creative spirits. At any rate, all tools and techniques are welcomed and encouraged.

    Almost all mainstream computer environments have basic text-to-speech capabilities built in - usually as a tool to support users with disabilities, etc. Similarly, speech-to-text is also available in the form of Apple's 'SIRI' and Microsoft's 'Cortana'.

    As inexpensive computing capacity becomes available these tools are becoming increasingly sophisticated in that they're quickly becoming more sensitive to linguistic and idiomatic differences, but this also adds to the complexity of using these tools.

    As we return (technologically) to our story-telling roots, these tools will become more prolific, capable, and interesting to uus in our creative endeavors.

    Text-to-speech tools:

    Microsoft WIndows 'Text-to-speech' (built-in, with extensions):

    Speech-to-text tools:

    Apple's SIRI - native to current MacOS/IOS devices

    Microsoft's Cortana - native to WIndows devices

    Nuance: Dragon Dictate Series: https://www.nuance.com/dragon.html

    IBM's speech-to-text: https://www.ibm.com/watson/services/speech-to-text/

    Google's text-to-speech: https://cloud.google.com/text-to-speech/ - from interesting thread here

    related (google) from REIVAX

    DNA Software (almost all Japanese) free TTS application: http://dnasoft.web.fc2.com/soft/texttowav/index.html (from the same discussion thread above)

    Post edited by mindsong on
  • mindsongmindsong Posts: 1,701
    edited July 2019

    Lip-Sync Resources:

    mcjaudiomation by MCasual: (free, but donate!) https://sites.google.com/site/mcasualsdazscripts2/mcjaudiomation DAZ/Poser Animation controlled by sound file contents. This little gem creates Poser-style PZ2 animation streams (tied to any figure sliders you like), based on the ongoing energy levels in sound files. While the examples in the documentation maps sounds to VU meters, lights, speaker movement, etc. It can also drive a cartoon mouth or emotion sliders with elegance. Windows w/ DS scripts.

    DAZ Inc. sound-to-motion mapping tools for lip-sync:

    These work with any figures that have an available '*.DMC' viseme/slider mapping files. Most DAZ figures have some form of DMC file available, and many non-DAZ figures have some that are available on sites like sharecg.com.

    Mimic Pro for Carrara: microphone input to figure viseme (defined mouth shape) motions.

    Mimic Live: (DAZ Studio, but can be exported to Carrara) microphone input to figure viseme (defined mouth shape) motions. Windows?

    Mimic Lite: No longer available 'lite' version of the standalone Mimic Pro utility (also no longer available? toolfarm.com?) for Poser/DAZ figures  Windows

    Mimic Pro: No longer available standalone sound to viseme mapping tool. Exports PZ2 files for conversion/import to other tools. Last known to be available at www.toolfarm.com. Windows

    DAZ Studio 4.x 32bit - 'lip-sync' (built-in plugin) - only found in the 32-bit versions of DAZ Studio (a Carrara Plugin :), this plugin leverages the early DAZ lip-sync tool libraries to enable sound-to-viseme mapping in DAZ figures that have so-called DMC mapping files available. Results can be exported as PZ2 Pose presets, or duf files for importing into Carrara.

    Papagayo  lip-sync tool - http://www.lostmarble.com/papagayo/ and python version: https://morevnaproject.org/papagayo-ng/ - don't know much about this one, but it's been around for a long time and might be useful in your workflow. Outputs to Moho (2D animation tool), and Blender. MacOS and Windows. Update: It looks like a DS script has been written to import papagayo outputs to DS availble at sharecg: Papagayo to DS Importer. Forum thread with instructions: https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio

    Relevant Links (forums/discussions/tutorials - anything that can eventually be used in Carrara):


    which mentions:





    Post edited by mindsong on
  • mindsongmindsong Posts: 1,701
    edited June 2019

    Other Audio Tools (music, MIDI, sound editors, DAWs, video-sound, etc.):

    Sound Editors: (there are zillions of these, but a few stand out for popularity/price/etc.)

    Audacity - Free/Open Source sound editor: https://www.audacityteam.org/
    Mature full-featured sound recorder, editor, and conversion tool.
    Windows, MacOS, and Linux

    Magix Music Maker (and other DAWs): https://www.magix.com/us/music/ - Free base software with lo-rez loops, payfor add-on instruments and hi-rez sound loop collections. Kind of like DAZ Studio for music SW/Content model. Note that there are both personal-use and commercial-use licenses limitations to these sound loops with prices to match...! Works like AniBlocks or NLA blocks but with sounds (MIDI and sound samples).

    IK Multimedia's series, especially 'SampleTank' : https://www.ikmultimedia.com/ - Full-range of sample/MIDI composition tools with beginner->pro versions and sound sample collections for sale. I believe these samples are all assumed to be used as professional commercial outputs. (any know otherwise?)

    Music Notation and lyrics to audio/sound files:

    Myriad Software - Musical Notation to audio tools: https://www.myriad-online.com/en/products/virtualsinger.htm

    Post edited by mindsong on
  • REIVAXREIVAX Posts: 70
    edited June 2019

    Hello mindsong

    one thing fun in vb script this read reel time computer. save this txt in xxxx.vbs

    Dim texte, lecture
    Set lecture=CreateObject("sapi.spvoice")
    texte="Il est "& time()
    lecture.speak texte

    clik on

    Ps sorry this in french


    and now one speech to txt. you can try its free


    and one virtual singer. creator are french. but the soft is in many langage and win32 64 /mac





    Post edited by REIVAX on
  • @REIVAX : Virtualsinger - la voix de Stephen Hawking, qui chante "Strangers in the Night" - c'est bien drole !

    I've never heard of virtual singer - but it seems like a lot of fun! The example of Strangers in the night sounds like Stephen Hawking!


  • WendyLuvsCatzWendyLuvsCatz Posts: 38,076
    Selinita said:

    @REIVAX : Virtualsinger - la voix de Stephen Hawking, qui chante "Strangers in the Night" - c'est bien drole !

    I've never heard of virtual singer - but it seems like a lot of fun! The example of Strangers in the night sounds like Stephen Hawking!


    I have used virtual singer in Myriad Melody Assistant for probably 10 years

    but not as frequently now as tend to do music without lyrics 

  • Here's a fun one to try...

    MUSIC: g(4)¦c(8)b(8)f(8)g(8)a(4)f(4)¦e(4)f(4)c(4)e(4)¦f(8)e(8)d(4)a(4)g(8)a(8)¦e(2+4)
    WORDS: I ¦have to do a lit-tle ¦house-work ba-by, ¦when I feel an-gry at ¦you.

    ¦ - barline
    (4) crotchet, quarter note
    (8) quaver, eighth note
    (2+4) dotted minim, 3 beats

  • WendyLuvsCatzWendyLuvsCatz Posts: 38,076
    Selinita said:

    Here's a fun one to try...

    MUSIC: g(4)¦c(8)b(8)f(8)g(8)a(4)f(4)¦e(4)f(4)c(4)e(4)¦f(8)e(8)d(4)a(4)g(8)a(8)¦e(2+4)
    WORDS: I ¦have to do a lit-tle ¦house-work ba-by, ¦when I feel an-gry at ¦you.

    ¦ - barline
    (4) crotchet, quarter note
    (8) quaver, eighth note
    (2+4) dotted minim, 3 beats

    If that was a music score I could sing it sight reading but letters and numbers I would really have to think about it, not on my PC right now, I guess most people used to piano rolls and numbers now, I have to use bar lines being raised with it learning piano from 7yo just cannot cope with other DAW software at all.

  • The music is Troika from Lieutenant Kije

  • WendyLuvsCatzWendyLuvsCatz Posts: 38,076
    edited June 2019

    An early video

    I literally have hundreds of them BTW

    A more recent one 

    Post edited by WendyLuvsCatz on
  • That's fab, Wendy - couldn't pick out a single word of what was being 'sung' but liked the note sliding. Maybe 'Mmmmmmm' would work better?

  • Like the Booty Fall Doll - very arty !

  • I downloaded the trial and here's my first attempt - straight out the box, seems pretty easy to use...



    Angry Baby.zip
  • Now with piano accompaniment (takes me back to A' Level Music where we had to harmonise [Bach] Chorals) and continuing lyrics...


    Angry Baby.zip
  • mindsongmindsong Posts: 1,701

    Thanks to all for the great inputs already. It looks like this thread is already striking a chord...

    I'll try to coordinate the contents in the TOC/header notes as time goes on.



  • REIVAXREIVAX Posts: 70
    edited June 2019

    hello all

    Speech breathing


    the pdf


    perhaps you don't know dance from arishapiro ; dance  character animation and simulation.
    with physics



    Post edited by REIVAX on
  • WendyLuvsCatzWendyLuvsCatz Posts: 38,076
    REIVAX said:

    hello all

    Speech breathing


    the pdf



    Somebody give the man a virtual inhaler cheeky

  • mindsongmindsong Posts: 1,701
    REIVAX said:

    hello all

    Speech breathing


    the pdf


    perhaps you don't know dance from arishapiro ; dance  character animation and simulation.
    with physics



    Interesting when pointed out... I haven't thought about it, but it does have an impact on the continuity/realism of the speech.

    Someone pointed out that humans generally blink before they change/redirect their gaze. Once you notice it, it's kind of distracting when you see it everywhere...

    cool links!


  • REIVAXREIVAX Posts: 70
    edited July 2019

    hi wendy

    it use python pygubu

    Post edited by REIVAX on
  • mindsongmindsong Posts: 1,701
    edited June 2019

    I added this tool to the reference posts (above), but it bears explicit mention:

    MCasual, our local DS freebie script hero, wrote some scripts and a sound analysis utility (windows) that binds soundfile characteristics (energy levels) to arbitrary poser/DAZ sliders of any sort. It's called 'mcjaudiomation' (free, but donate!) from https://sites.google.com/site/mcasualsdazscripts2/mcjaudiomation.

    This little gem creates Poser-style PZ2 animation streams (tied to any figure sliders you like), based on the ongoing energy levels in sound files. While the examples in the documentation map ties sounds to things like VU meters, lights, speaker movement, etc. It can also drive a figure's mouth or emotion sliders with a certain elegance - works really well for cartoon vocals.

    The results can be imported into Carrara or be used in any other workflow that starts with PZ2 'pose preset' or duf files. I have mimic-pro, which does sophisticated sound analysis to map visemes and the like, and I find that this far more basic approach works pretty darned well in comparison.

    I presume it could be used by someone to drive any motion by simply making well-times sounds (vocal or otherwise), to drive arbitrary sliders. E.g. saying "tick tock tick tock" to control a clock pendulum, etc.


    Post edited by mindsong on
  • MistaraMistara Posts: 38,675
    edited October 2019

    after all this time funally bought my microphone.  went usb 

    Blue Microphones - Snowball USB Cardioid and Omnidirectional Electret Condenser Vocal Microphone


    ahem  mee mee mee maa maa maa moh moh moh muuu muu moo
    doh ray mee fah soh lah tee dohh

    Post edited by Mistara on
  • WendyLuvsCatzWendyLuvsCatz Posts: 38,076
    edited October 2019

    Windows 10 has piles of Text to speech voices you can get for Narrator including Australian accents but you can only access 4 of them through any other apps ie Balabolka, iClone etc without a registry hack, two American and British English ones (each gender)

    the registry hack scares me too much to try

    there is another hack for Cortana too


    for now I have used Narrator

    prepared text

    Audacity set to use stereomix as my microphone and my non existent motherboard  sound output to playback as I have the Nvidia one on my monitor speaker 

    Post edited by WendyLuvsCatz on
  • DartanbeckDartanbeck Posts: 21,331

    Cool stuff!

    It's been a long time since I've used Mimic Pro for Carrara, and even longer since using Mimic Pro (standalone) for Poser.

    The later creates PZ2 animated pose (or is it the Face files?) with the sound injected into it, so when you apply the pose (or is it expression FC2?), the sound comes with it. Pretty cool. It's also a workshop for tweaking the visemes, expressions, and extra motion to your liking before writing the final file.

    The thing that I really love about Mimic Pro for Carrara (besides the fact that it works directly in Carrara and works really well) is that we can create our own viseme shapes as NLA poses individually for any give character - so I can make Rosie talk like Rosie (visually), Dart talk like... well... me, the bad guy talk like the bad guy, etc.,

    Okay, after saying all of that, I am eager to try mCasual's plugin!

  • MistaraMistara Posts: 38,675

    seeing there is an audacity pro version  undecided on it

  • WendyLuvsCatzWendyLuvsCatz Posts: 38,076
    Mystarra said:

    seeing there is an audacity pro version  undecided on it


    its a rip off like the people who sell a version of Blender 3D

    it opensource software


  • MistaraMistara Posts: 38,675

    the feature i really need is the remove background noise.

    my place has noisy refridgerator.

    the stores don't plug in the refrigerators, can't hear them before buying.

  • MistaraMistara Posts: 38,675

    Windows 10 has piles of Text to speech voices you can get for Narrator including Australian accents but you can only access 4 of them through any other apps ie Balabolka, iClone etc without a registry hack, two American and British English ones (each gender)

    the registry hack scares me too much to try

    there is another hack for Cortana too


    for now I have used Narrator

    prepared text

    Audacity set to use stereomix as my microphone and my non existent motherboard  sound output to playback as I have the Nvidia one on my monitor speaker 

    heart would love to give my actors Australian accents

  • MistaraMistara Posts: 38,675

    what kind of accent would be good for a minotaur?

  • TangoAlphaTangoAlpha Posts: 4,584


Sign In or Register to comment.