Abogen: a tool for converting multiple text formats to audiobooks

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural, smooth speech and supports synchronized subtitle generation, making it ideal for audiobooks, video dubbing or learning aids. Users can select multiple languages and male and female voices, adjust subtitle granularity, and even mix different speech models to create unique sound effects.Abogen supports audio formats such as WAV, FLAC, MP3, and M4B, and is easy to use and compatible with Windows, Linux, and macOS.

Abogen: a tool for converting multiple text formats into audiobooks-1

Function List

Supports ePub, PDF, and TXT file input for automatic text extraction.
Generating high-quality natural speech using the Kokoro-82M model.
Multiple languages and male and female voice options are available, such as American English, British English, and more.
Supports subtitle generation with segmentation by sentence, word or custom granularity.
Allows mixing of different speech models to create personalized voices.
Output audio formats include WAV, FLAC, MP3, and M4B (chapter support).
Provides a built-in text editor for easy direct text input or modification.
Support for Docker deployment simplifies installation and operation.
Choose where to save the output file, such as the desktop or a custom folder.

Using Help

Installation process

The installation of Abogen requires a number of dependencies, including the Python environment and espeak-ng. Here are the detailed steps:

1. Install espeak-ng

Visit espeak-ng's latest release page to download the .msi file (Windows) or install via package manager (Linux/macOS).
Windows users: Run the downloaded .msi file, follow the prompts to complete the installation.
Linux users: running commands sudo apt-get install espeak-ng(Ubuntu/Debian) or sudo yum install espeak-ng(CentOS).
macOS users: run with Homebrew brew install espeak-ngThe

2. Installing Python and PyTorch

Make sure Python 3.8 or later is installed on your system.

Install PyTorch (NVIDIA GPUs are recommended for GPU acceleration):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

If you do not have an NVIDIA GPU, run the following command to install the CPU version:
```
pip install torch torchvision torchaudio
```

3. Installation of Abogen

Run the following command to install Abogen:
```
pip install abogen
```
After the installation is complete, run abogen command launches the graphical interface (GUI).

4. Using Docker (optional)

If you wish to run Abogen through Docker, you can simplify dependency management:

Ensure that Docker is installed.

Clone the Abogen repository:

git clone https://github.com/denizsafak/abogen.git
cd abogen

Build the Docker image:

docker build --progress plain -t abogen .

Run the Docker container:

Windows:

docker run --name abogen -v %CD%:/shared -p 5800:5800 -p 5900:5900 --gpus all abogen

Linux:

docker run --name abogen -v $(pwd):/shared -p 5800:5800 -p 5900:5900 --gpus all abogen

macOS:

docker run --name abogen -v $(pwd):/shared -p 5800:5800 -p 5900:5900 abogen

Visit Abogen:
- Access via browser http://localhost:5800The
- or connect using a VNC client localhost:5900The

Main Functions

1. Converting text to audio

After starting Abogen, the graphical interface opens.
Click the Select File button to upload an ePub, PDF, or TXT file, or use the built-in text editor to enter text.
Select the language and voice (e.g. a_m Indicates an American English male voice.b_f (Indicates a British English female voice).
Configure subtitle options: select "Sentence", "Sentence + comma" or split by number of words (e.g. 1 word, 2 words).
Click the Generate button and wait for processing to complete. Processing time depends on file size and hardware performance (e.g. 3000 characters of text takes about 11 seconds on an RTX 2060).

2. Customized speech

In the Voice Mixer, adjust the proportions of different voice models to create unique sound effects.
Save the mix configuration as a "voice profile" for easy reuse.
Test the voice effect: Click the "Preview" button to listen to the generated sound clip.

3. Output settings

Select the audio format: WAV (lossless), FLAC (compressed lossless), MP3 (universal) or M4B (audiobook format with chapter support).
Setting the save location: Select "Save to desktop", "Save next to input file" or a customized folder.
If you need subtitles, check "Generate subtitles" and select the output format (e.g. SRT).

4. Command-line mode

If there is a problem with the graphical interface, it can be run from the command line:
```
abogen --cli
```
Command line mode displays detailed error messages for easy troubleshooting.

caveat

Ensure that the input file is formatted correctly, PDF files may have incomplete text extraction due to complex layout.
GPU acceleration is recommended for faster processing, CPU processing may be slower.
If you run into problems, check out the Issues page on GitHub or submit a new issue for help.

application scenario

Production of audiobooks
Users can convert novels, textbooks, or documents into audiobooks for easy listening while commuting or exercising.Abogen's M4B output supports chaptering for long-form content.
video dubbing
Content creators can generate natural voice overs for YouTube, TikTok or Instagram videos with synchronized subtitles to enhance the professionalism of their videos.
Learning Assistance
Students can convert PDF textbooks or handouts to audio and combine them with subtitles to aid listening and learning for language learners or the visually impaired.
Podcast production
Podcast producers can convert scripts to audio, quickly generate audition clips, and adjust voice styles to match program themes.

QA

What file formats does Abogen support?
Abogen supports ePub, PDF and TXT files as input, output audio formats including WAV, FLAC, MP3 and M4B, and subtitles in SRT format.
How to improve the accuracy of text extraction?
For PDF files, it is recommended to use a document with simple layout. If the extraction is not accurate, you can convert the PDF to a TXT file before inputting.
Do I need a GPU to run Abogen?
Not required, but using an NVIDIA GPU can significantly speed up processing. a CPU will also work, but at a slower speed.
How do I contribute code or report a problem?
Visit the GitHub repository, submit a Pull Request to contribute code, or report an issue on the Issues page with detailed error information.

Abogen: a tool for converting multiple text formats to audiobooks

General Introduction

Function List

Using Help

Installation process

1. Install espeak-ng

2. Installing Python and PyTorch

3. Installation of Abogen

4. Using Docker (optional)

Main Functions

1. Converting text to audio

2. Customized speech

3. Output settings

4. Command-line mode

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification