FlipaClip (Voice Maker)

Mobile app

iOS + Android

Freemium subscription

An in-app voice tool for FlipaClip. Pick a voice, write dialogue, and generate audio without leaving the workflow. Built for every screen, from iPad to 320px Android.

FlipaClip Voice Maker — voice selection screen

FlipaClip Voice Maker — text input screen

FlipaClip Voice Maker — generate audio screen

Context

Voice Maker is a text-to-speech tool built inside FlipaClip. The goal was to feel as fast and natural as drawing: touch-first, with no friction between a script idea and a rendered clip.

Device support was non-negotiable, from tablets to small Android screens at 320×568. It had to work on all of them, not just the easy ones.

Voice Maker across device sizes — 320×568, 375×812, and 834×1266

Role

I was the sole product designer on a 6-week project, partnering with two co-founders, the development team, and QA to validate edge cases across devices before launch.

What I owned

End-to-end UX from concept to ship
Core flows and edge cases
High-fidelity UI across breakpoints
Specs and handoff
Final polish

Requirements

Before any design work began, I mapped the full feature scope with stakeholders. These requirements were defined at kickoff as a shared checklist to keep the team aligned throughout.

Voice Actor Selection

List of available voice actors

Display a variety of voice actors for users to choose from.

Voice Actor Selection

Filter and search function for voices

Allow users to filter and search for specific voices based on criteria like gender and accent.

Voice Actor Selection

Preview voice samples

Enable users to listen to short previews of each available voice.

Text-to-Speech Functionality

Input text field

Provide a space where users can type or paste text for conversion to speech.

Text-to-Speech Functionality

Choose voice for rendering

Allow users to select a voice from the available list to use for text-to-speech.

Text-to-Speech Functionality

Play and render voice option

Provide a button to render and play the input text with the chosen voice.

Text-to-Speech Functionality

AI Helper / Assistant

The AI assistant helps users formulate text or provide suggestions while entering text in the input field.

Text-to-Speech Functionality

Generate writing ideas

Suggest alternative ways of writing or ideas for new text based on context.

Usage & Credits Tracking

Display remaining credits

Show users how many credits they have left for text-to-speech usage.

Usage & Credits Tracking

Usage stats

Provide details like monthly usage and character count for better credit tracking.

Previously Rendered Voices

List of saved voice renderings

Display a list of previously generated voices that users can revisit.

Previously Rendered Voices

Play or re-render option

Enable users to play or modify previously rendered voices.

Previously Rendered Voices

Manage saved voices

Provide options to delete or rename saved voice renderings.

Voice Recording Upload

Upload audio / voice samples

Allow users to upload their own voice recordings (up to 30 seconds) for analysis.

Voice Recording Upload

AI voice analysis tool

Use AI to analyze uploaded audio and prepare it for voice cloning based on the recording.

Voice Recording Upload

Voice Cloning

Enable users to create new voices based on their recordings or other audio inputs, with a trimming tool to edit audio as needed.

Decisions

The journey map shows the core friction: every audio tweak meant leaving the creation screen. The fix was keeping text, voice, and generation on one surface. Voice selection is trial and error, so browsing and preview had to feel lightweight across every device.

	Awareness	Onboarding	Exploring Options	Generate Audio	Previewing Audio	Editing & Refinement	Finalization
Actions	User opens the app to explore text-to-speech (TTS) feature.	User begins by typing or pasting their text and choosing a voice actor from a pre-generated list.	User tries out different voice actors by selecting and changing them while updating text input.	User selects 'Generate Audio' and is taken out of the current screen into the animation timeline.	User listens to the generated audio in the animation timeline section.	User goes back to the previous screen, adjusts the text or selects another voice actor.	User finally generates audio, reviews it, and decides to keep it, proceeding to the timeline.
Expectations	Wants to convert written text to audio with a voice that matches their preferences.	Expects to find the right voice for the text without much friction.	Hopes to find the best voice match for their text and easily make adjustments.	Expects to hear the final audio immediately but must leave the original screen to hear it.	Expects the audio to be a good fit, but is unsure if they might need to go back and make more edits.	Wants to quickly switch voices or refine text without needing to re-enter the entire process again.	Expects the final version to match their vision and hopes not to make any more changes.
Thoughts	Curious about how text-to-speech works. Wondering how the voices will sound when narrating their text.	Excited but uncertain about the selected voice matching the tone they want.	Worried that they cannot preview the generated audio in real-time without being taken to a new section (timeline).	Frustrated about having to leave the main screen, making it harder to make quick edits to the text or voice.	Contemplates whether further edits are necessary and dreads having to return to the previous screen.	Worry that the editing process will take too long, leading to a frustrating back-and-forth between sections.	Relieved that the process is over and pleased with the final outcome, though frustrated by the effort required.
Emotion	😊	😐	😕	😩	😕	😩	😅

Journey map

Scroll to see all stages →

The Loop

Voice iteration is repetitive: write, pick a voice, generate, and adjust. The goal was to keep it all on one surface, inside FlipaClip.

What shipped

Write: compose and edit dialogue with optional AI writing assistance
Voice: browse, preview, and filter voice actors by style
Generate: render text-to-speech and iterate quickly
Library: save, rename, and reuse clips across sessions

Write

Voice

Generate

Library

Shipped

Before release, we ran an in-house QA pass across devices to catch layout breaks, flow issues, and state handling problems. I reviewed builds for visual consistency and interaction quality while engineering and QA validated bugs and edge cases.

The priority was small screens. A broken layout on a 320px Android undermines the whole product. If it worked there, it worked everywhere.

Learnings

Post-launch, the strongest signal was repeat behavior. Creators generated multiple clips in a session and returned to saved audio, suggesting Voice Maker became part of normal creation flow rather than a novelty.

The AI ethics split was sharper than expected. Most creators were practical about it. A vocal minority pushed back hard, arguing AI voice devalued real performance. We made it opt-in and clearly labeled.

Simplicity is what makes or breaks a tool like this. Every extra step is friction, and friction kills repeat use. A fast loop was the foundation. The library and cloning were built around it.

✨Adoption31%opened Voice Maker in the first two weeks after launch

🗓️Ongoing usage38%of users who tried it returned within 30 days

🎙️Engagement4creators generated on average 4 clips in a single session

Other works

seventy8Design System

LookaWeb App

Let's create something

amazing together

blamebento@gmail.com