Stark 7673471e8e | 5 months ago | |
---|---|---|
.. | ||
models | 5 months ago | |
static | 5 months ago | |
stslib | 5 months ago | |
templates | 5 months ago | |
.gitignore | 5 months ago | |
LICENSE | 5 months ago | |
README.md | 5 months ago | |
README_EN.md | 5 months ago | |
client.py | 5 months ago | |
ffmpeg.7z | 5 months ago | |
requirements.txt | 5 months ago | |
run.bat | 5 months ago | |
set.ini | 5 months ago | |
start.py | 5 months ago | |
test.py | 5 months ago | |
testcuda.py | 5 months ago | |
version.json | 5 months ago |
README_EN.md
简体中文 / 捐助本项目 / Discord / QQ Group 902124277
Voice Recognition to Text Tool
This is an offline local voice recognition tool to text, based on the open-source model fast-whisper. It can recognize and convert human voice in videos/audios into text, in json format, srt subtitle with timestamps format, and plain text format. It can be used after self-deployment to replace the voice recognition interface of openai or Baidu Voice Recognition, etc. The accuracy is basically the same as the official api interface of openai.
After deployment or download, double click on start.exe to automatically call the local browser to open the local webpage.
Drag or click to select the audio and video file to be recognized, then select the speaking language, output text format, model used (base model built-in), click start recognition, after completion, output in the selected format on the current webpage.
The entire process does not require the internet, it operates entirely locally, and can be deployed on the intranet.
The fast-whisper open-source model has base/small/medium/large-v3, with built-in base model, base->large-v3 recognition effect is getting better and better, but the computer resources required are also more, you can download and unzip it into the models directory according to need.
Video Demonstration
https://github.com/jianchang512/stt/assets/3378335/d716acb6-c20c-4174-9620-f574a7ff095d
Precompiled Win Version Usage Method / Linux and Mac Source Code Deployment
-
Click here to go to the Releases page to download precompiled file
-
After downloading, unzip it somewhere, such as E:/stt
-
Double-click start.exe, and wait for the browser window to open automatically
-
Click on the upload area on the page, find the audio or video file you want to recognize in the pop-up window, or directly drag the audio and video file to the upload area, then select the spoken language, text output format, and model used, click "Start Recognition Immediately", wait for a while, the text boxes at the bottom will display the recognition results in the selected format
-
If the computer has an Nvidia GPU and the CUDA environment is correctly configured, CUDA acceleration will be used automatically
Source Code Deployment (Linux / Mac / Window)
-
Required python 3.9->3.11
-
Create an empty directory, such as E:/stt, open cmd window in this directory, the method is to enter
cmd
in the address bar, and then press enter.Using git to pull the source code to the current directory
git clone git@github.com:jianchang512/stt.git .
-
Create a virtual environment
python -m venv venv
-
Activate the environment, the command under win is
%cd%/venv/scripts/activate
, the linux and Mac go to google and search. if want to use cuda,and execpip uninstall -y torch
,pip install torch --index-url https://download.pytorch.org/whl/cu121
-
Install dependencies:
pip install -r requirements.txt
, if you report a version conflict error, please runpip install -r requirements.txt --no-deps
-
Decompress ffmpeg.7z under Windows, and put the
ffmpeg.exe
andffprobe.exe
in it in the project directory, linux and mac to download the corresponding version ffmpeg from the ffmpeg official website, unzip theffmpeg
andffprobe
binary programs and put them at the root of the project -
Download the model compression package, download the model as necessary, after downloading, put the folder in the compression package into the models folder of the root of the project
-
Execute
python start.py
, wait for the local browser window to open automatically.
API Interface
Interface address: http://127.0.0.1:9977/api
Request method: POST
Request parameters:
language: Language code: optional below
>
> Chinese: zh
> English: en
> French: fr
> German: de
> Japanese: ja
> Korean: ko
> Russian: ru
> Spanish: es
> Thai: th
> Italian: it
> Portuguese: pt
> Vietnamese: vi
> Arabic: ar
> Turkish: tr
>
model: Model name, optional below
>
> base corresponds to models/models--Systran--faster-whisper-base
> small corresponds to models/models--Systran--faster-whisper-small
> medium corresponds to models/models--Systran--faster-whisper-medium
> large-v3 corresponds to models/models--Systran--faster-whisper-large-v3
>
response_format: the returned subtitle format. Can be text|json|srt
file: audio and video files, binary upload
Api request example
import requests
# Request address
url = "http://127.0.0.1:9977/api"
# Request parameters include file: audio and video files, language: language code, model: model, response_format: text|json|srt
# Returns code==0 success, others fail, msg==success is ok, others fail reasons, data=returned text after recognition
files = {"file": open("C:\\Users\\c1\\Videos\\2.wav", "rb")}
data={"language":"zh","model":"base","response_format":"json"}
response = requests.request("POST", url, timeout=600, data=data,files=files)
print(response.json())
CUDA Acceleration Support
Install CUDA Tools Detailed installation method
If your computer has Nvidia graphics card, first upgrade the graphics card driver to the latest, and then to install the corresponding CUDA Toolkit and cudnn for CUDA11.X.
After the installation is completed, press Win + R
, type cmd
and then press enter. In the pop-up window, type nvcc --version
, confirm that there is version information displayed, similar to the graphic shown
Then continue typing nvidia-smi
, confirm there is output info and you can see the cuda version number, similar to the graphic shown
Then execute `python testcuda.py`, if it prompts success, it means the installation is correct, otherwise please carefully check and reinstall
By default, CPU operation is used. If you are sure to use a NVIDIA graphics card and have configured the CUDA environment, please modify the devtype=CPU in set.ini to devtype=CUDA and restart to use CUDA acceleration
Notices
- If you do not have Nvidia graphics card or the CUDA environment is not properly configured, do not use the large/large-v3 model, it may cause the memory to exhaust and crash
- Chinese in some cases will output traditional characters
- Sometimes you will encounter an error "cublasxx.dll does not exist", at this time you need to download cuBLAS, and then copy the dll file to the system directory, click to download cuBLAS, after decompression, copy the dll file inside to C:/Windows/System32
- By default, CPU operation is used. If you are sure to use a NVIDIA graphics card and have configured the CUDA environment, please modify the devtype=CPU in set.ini to devtype=CUDA and restart to use CUDA acceleration
Related Projects
Video translation dubbing tool: translate subtitles and dub
Voice Cloning Tool: Synthesize speech with any sound color
Acknowledgement
The other projects mainly dependent on this project are