TTS Dataset Creation STT Dataset Creation

Speech Data Builder

The ultimate free tool for creating high-quality speech datasets for AI voice models, speech recognition, and voice synthesis projects.

View on GitHub
Speech Data Builder Illustration
Audio Files
0

Drag audio files here or click to upload

  • No audio files yet

    Upload files to get started

No file selected

No Audio Selected

Upload and select an audio file to begin editing

0:00 / 0:00
Select a region on the waveform by clicking and dragging to enable cutting tools.
Transcript
All changes saved
Space to play/pause, Ctrl+S to save
Normalized Text LJSpeech
All changes saved
Typically lowercase without special characters or numbers.

Create Professional TTS & STT Datasets

Powerful features that make speech dataset creation simple, fast, and professional

AI-Powered Transcription

Automatically transcribe your audio files using state-of-the-art AI models from Google and OpenAI with remarkable accuracy.

  • Whisper/Gemini Integration
  • Multi-language Support
  • High Accuracy Results

Multiple Export Formats

Export your datasets in LJSpeech, CSV, JSON, and TXT formats for immediate use in your TTS and STT machine learning models.

  • LJSpeech Format
  • Common Voice Compatible
  • Custom Configurations

Audio Visualization

Advanced waveform display with interactive regions and precise playback controls for perfect audio-text alignment.

  • Waveform Display
  • Region Selection
  • Audio Trimming

How Speech Data Builder Works

Create professional TTS and STT datasets in three simple steps

1

Upload Audio

Import your audio files in multiple formats including MP3, WAV, OGG, and FLAC.

2

Create Transcripts

Manually transcribe or use AI to automatically generate accurate transcriptions.

3

Export Dataset

Download your complete dataset in the format of your choice for model training.