Pages in topic:   < [1 2 3 4 5] >
Is OpenAI’s Whisper better than Dragon?
Thread poster: Hans Lenting
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
‘Is OpenAI’s Whisper better than Dragon?’ YES, if used inside Talon Voice! Jan 23

So, I have been using Talon Voice now for a few months, and have just uninstalled Dragon. Finally, after all these years, I finally found something that works reliably, both for dictation and voice commands.

Also, Talon can so some crazy stuff when used together with various Python powered extensions. For example, have a look at what you can achieve with Talon Voice + Python + ChatGPT
... See more
So, I have been using Talon Voice now for a few months, and have just uninstalled Dragon. Finally, after all these years, I finally found something that works reliably, both for dictation and voice commands.

Also, Talon can so some crazy stuff when used together with various Python powered extensions. For example, have a look at what you can achieve with Talon Voice + Python + ChatGPT!

https://talonvoice.slack.com/files/U059C2GP9PB/F06BMQAQXGC/2023-12-25_19-59-32.mp4 (make sure to watch this video!)

Talon-AI-Tools

The ChatGPT integration comes from a little something called "Talon-AI-Tools"​

https://github.com/C-Loftus/talon-ai-tools
https://github.com/C-Loftus/talon-ai-tools/tree/main/GPT
https://github.com/C-Loftus/talon-ai-tools/blob/main/GPT/staticPrompt.talon-list

Here are some of the example voice prompts. That is, you select some text on yr computer and just say these commands, and Talon uses ChatGPT in the background to do stuff to your selected text or in your current text field!

Talon-AI-Tools-example voice prompts

Also have a look at my recent post "Wow, I'm uninstalling Dragon. No longer need it as I am 100% happy with Talon voice!" @ https://forums.knowbrainer.com/forum/third-party-command-utilities-vocola-unimacro-voicepower-python/404-wow-i-m-uninstalling-dragon-no-longer-need-it-as-i-am-100-happy-with-talon-voice (on the new KnowBrainer forum)

If you haven't tried Talon yet, I highly recommend having a look at it! The possibilities are limitless. I have it on all day and it doesn't slow down my computer at all, like Dragon used to. I now do half my clicking in memoQ by voice (and am adding new voice commands every day), dictate all my emails, and have even started integrating voice commands into my little AutoHotkey-powered app project, "Beijer.bot".

Talon-Beijer.bot

See: https://beijer.bot/

Some useful information about Talon can be found at:
https://talonvoice.com/
https://app.slack.com/client/T7FPSMV8F/C9MHQ4AGP
https://talon.wiki/
https://chaosparrot.github.io/talon_practice/
https://www.joshwcomeau.com/blog/hands-free-coding/
https://handsfreecoding.org/2021/12/12/talon-in-depth-review/
https://xeiaso.net/blog/voice-control-talon/#:~:text=All%20the%20details%20float%20away,acts%20kind%20of%20like%20vim.





[Edited at 2024-01-23 22:11 GMT]
Collapse


 
Milan Condak
Milan Condak  Identity Verified
Local time: 19:54
English to Czech
Speech Translate Jan 23

I replaced the software that uses the Whisper models.
I have been using "Speech Translate" for more than half a year.

GitHub - Dadangdut33/Speech-Translate: A realtime speech transcription…

https://github.com/Dadangdut33/Speech-Translate/

I first used the Whisper V2 model, now I am using the V3.

I use it for various languages, b
... See more
I replaced the software that uses the Whisper models.
I have been using "Speech Translate" for more than half a year.

GitHub - Dadangdut33/Speech-Translate: A realtime speech transcription…

https://github.com/Dadangdut33/Speech-Translate/

I first used the Whisper V2 model, now I am using the V3.

I use it for various languages, but especially for Czech transcription.

Milan
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
thanks! (+ idea for a Dual-engine, bilingual speech and command solution!) Jan 23

Milan Condak wrote:

I replaced the software that uses the Whisper models.
I have been using "Speech Translate" for more than half a year.

GitHub - Dadangdut33/Speech-Translate: A realtime speech transcription…

https://github.com/Dadangdut33/Speech-Translate/

I first used the Whisper V2 model, now I am using the V3.

I use it for various languages, but especially for Czech transcription.

Milan


Although Talon allows you two switch between various languages, its main language is English. To dictate in Dutch (my second language), I need to switch from Talon's main language to one of its secondary languages (powered by the WebSpeech multingual engine), which doesn't always work quickly or reliably. However, reading your post gave me an idea: why not run two dictation systems, one always on, for English and commands, and the other asleep, waiting to be triggered, for Dutch.

• Talon for English (commands and dictation)(awake)
• Speech Translate for Dutch (dictation)(asleep)

That way, you would effectively have your own Dual-engine, bilingual speech and command solution!

Also, I'm not sure which version of Whisper Talon uses. I know it's some kind of special hybrid engine, which combines the developer's own engine ‘Conformer D’ with a version of Whisper.

whisper-in-Talon


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
dictating in 2 languages already possible in Talon (with optional WebSpeech multilingual engine)! Jan 23

Just ran some tests, and it is actually pretty easy to already do this with Talon alone.

Normally, this is how things work:

Saying ‘dictation mode’ switches to English dictation mode
Saying ‘command mode’ switches to English command mode

However, if you activate the optional (beta-only!**) WebSpeech engine, you can add in secondary languages, like this (for Dutch):

- Saying "dutch mode" puts Talon in (WebSpeech-powered) Dutch mode
... See more
Just ran some tests, and it is actually pretty easy to already do this with Talon alone.

Normally, this is how things work:

Saying ‘dictation mode’ switches to English dictation mode
Saying ‘command mode’ switches to English command mode

However, if you activate the optional (beta-only!**) WebSpeech engine, you can add in secondary languages, like this (for Dutch):

- Saying "dutch mode" puts Talon in (WebSpeech-powered) Dutch mode.
- When in Dutch mode, I can then say "opdrachtmodus", to put Talon back in English (command) mode.

~

In an ideal world, the following would also work:
- normally, the English engine (Conformer D + Whisper) is on, normally in command mode.
- if I want to dictate in English, I precede my dictation by ‘Say’
- if I want to dictate in Dutch, I precede my dictation by ‘Say Dutch’ (or even just ‘Dutch’)

I really shouldn't complain. It's already amazing that this work at all. I was never able to switch between Dutch and English back when I was still using Dragon! And I am super grateful that Ryan (the developer of Talon) has created such a fantastic tool, which has really changed the way I use dictation in my daily work. I always gave up on Dragon after a few weeks of frustrating slow-downs, crashes, etc. I briefly enjoyed Vocola and KnowBrainer, but since they both ran on top of Dragon, they never lasted long either. Talon is a real game changer!

** https://talonvoice.com/dl/latest/changelog.html

0.4.0 beta-only features:
New: Whisper hybrid speech recognition engine.
New: Talon Menu -> Scripting -> Debug Window.
New: deck() support for Elgato Stream Deck in .talon files.
New: hotkey_wait setting to pause after complex hotkeys are pressed.
New: "selection lists" API via ctx.selections["user.listname"] = "string of text for selection"
"Mixed Mode" simultaneous command and dictation mode.
Faster speech recognition.
Parrot noise recognition.
Vosk multilingual engine.
WebSpeech multingual engine.
Mac: face expression input.


Webspeech engine


[Edited at 2024-01-23 23:41 GMT]

[Edited at 2024-01-23 23:52 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
The WebSpeech multilingual engine comes with 14 additional languages! Jan 24

(plus English)

Webspeech engine-14 languages


 
Milan Condak
Milan Condak  Identity Verified
Local time: 19:54
English to Czech
Whisper has more than 100 languages Jan 24

The languages in Whisper models:

https://github.com/openai/whisper?tab=readme-ov-file

The releases:

https://github.com/openai/whisper/releases/

Last release 20231117 Nov 17, 2023... See more
The languages in Whisper models:

https://github.com/openai/whisper?tab=readme-ov-file

The releases:

https://github.com/openai/whisper/releases/

Last release 20231117 Nov 17, 2023

Home page:

https://github.com/openai/whisper
Collapse


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:54
Member (2014)
Japanese to English
Heartening Jan 24

Thanks for these updates Michael, I'm very glad to see the progress being made. The sticking point for me would be the ability to select text and correct it, as I do that quite a lot. But maybe if Talon were to make fewer mistakes than Dragon I would not need to use the select function as much? Anyway, it gives us more options which has to be great.

I'm thinking of installing Talon, but in a few weeks, as I don't want to switch to a new system for production during a busy period. I
... See more
Thanks for these updates Michael, I'm very glad to see the progress being made. The sticking point for me would be the ability to select text and correct it, as I do that quite a lot. But maybe if Talon were to make fewer mistakes than Dragon I would not need to use the select function as much? Anyway, it gives us more options which has to be great.

I'm thinking of installing Talon, but in a few weeks, as I don't want to switch to a new system for production during a busy period. I don't imagine there'll be any problem having the two installed at once?

Dan
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
selecting/correcting text not yet possible with Talon, but check out "Talon-AI-Tools" Jan 24

Hi Dan,

Installing Talon alongside Dragon is totally fine. Talon is very light; you won't even notice it running.

Dan Lucas wrote:

Thanks for these updates Michael, I'm very glad to see the progress being made. The sticking point for me would be the ability to select text and correct it, as I do that quite a lot. But maybe if Talon were to make fewer mistakes than Dragon I would not need to use the select function as much? Anyway, it gives us more options which has to be great.

I'm thinking of installing Talon, but in a few weeks, as I don't want to switch to a new system for production during a busy period. I don't imagine there'll be any problem having the two installed at once?

Dan


Yes, selecting text and correcting it remains something that can't be done. However, there are various workarounds, one of my favourites (but which doesn't allow you to select the incorrect word by voice) is to use ChatGPT to quickly correct any mistakes in what you just dictated, by voice. This can be achieved by installing "Talon-AI-Tools" (https://github.com/C-Loftus/talon-ai-tools ) alongside Talon, which allows you to do all kinds of crazy stuff by voice using ChatGPT. Basically anything you can dream up prompt-wise, but using your voice.

Here is a peak at some stuff that comes with Talon-AI-Tools out of the box:


mode: command

-
# Ask a question in the voice command and the AI will answer it.
model ask $:
result = user.gpt_answer_question(text)
user.paste(result)

# Runs a model prompt on the selected text and pastes the result.
model {user.staticPrompt} [this]$:
text = edit.selected_text()
result = user.gpt_apply_prompt(user.staticPrompt, text)
user.paste(result)

# Runs a model prompt on the selected text and sets the result to the clipboard
model clip {user.staticPrompt} [this]$:
text = edit.selected_text()
result = user.gpt_apply_prompt(user.staticPrompt, text)
clip.set_text(result)

# Say your prompt directly and the AI will apply it to the selected text
model please $:
prompt = user.text
txt = edit.selected_text()
result = user.gpt_apply_prompt(prompt, txt)
user.paste(result)

# TODO: make this less verbose in output
# Shows the list of available prompts
model help$:
user.gpt_help()


and

# Use static prompts for detailed instructions without needing to say them every time

# These all operate upon the currently selected text

## FIXES

fix grammar formally: Fix any mistakes or irregularities in grammar, spelling, or formatting. Use a professional business tone. The text was created used voice dictation. Thus, there is likely to be issues regarding homophones and other misrecognitions. Do not change the original structure of the text.

fix grammar: Fix any mistakes or irregularities in grammar, spelling, or formatting. The text was created used voice dictation. Thus, there is likely to be issues regarding homophones and other misrecognitions. Do not change the tone. Do not change the original structure of the text.

fix grammar casually: Fix any mistakes or irregularities in grammar, spelling, or formatting. Keep a casual tone appropriate for a blog. Do not change the original structure of the text. The text was created used voice dictation. Thus, there is likely to be issues regarding homophones and other misrecognitions.

## FORMATTING

format table: The following markdown text is raw data. The first row is the header. There is no index. Return the text in a markdown table format. Each row has a new line in the original data.

format bullets: Convert the every paragraph into a one heading with series of bullet points underneath it. Each paragraph is separated by a new line. Separate paragraphs should not have combined bullet points. This should all be done in markdown syntax. If it is a small paragraph than you can just leave it as a heading and not add bullet points. Do not reduce content, only reduce things that would be redundant. These bullet points should be in a useful format for notes for those who want to quickly look at it. If there is a citation in the markdown original then keep the citation just at the top and not within every individual bullet point.

format mermaid: convert the following plain text into the text syntax for a mermaid diagram.

format gant: convert the following plain text into the text syntax for a gant chart within mermaid.

## TEXT GENERATION

explain: explain this text in a way that is easier to understand for a layman without technical knowledge

summarize: Summarize this text into a format suitable for project notes

add context: Add additional text to the sentence that would be appropriate to the situation and useful in a consulting project.

auto generate schema: the given text is from responses to a survey. They are open ended and have no inherent structure. Map each of these responses to a schema that would be useful for summarizing all the responses and doing categorical analysis. Generate this schema on the fly by grouping responses. Return the responses in a list with a _ separating each item. Do not make the schema overly specific. Do not make the schema category names longer than a sentence a maximum.

answer: generate text that satisfies the question or request given in the input

shell: generate a unix shell script that performs the following actions. Output only the command. Do not output any comments or explanations.

## CONVERSIONS

convert to jason: Convert the following data into a json format.

convert to markdown: Convert the following plain text into a markdown format.

convert from jason to python: Convert the following json into the syntax for a python dictionary. This is essentially just serializing it into a native python format.

convert from jason to markdown: Convert the following json into the syntax for a markdown list.

## ACCESSIBILITY

describe code: explain what the following code does in natural language at a high level without getting into the specifics of the syntax.

describe structure: describe the structure of the following text. Do not describe the content. Describe the structure of the text.

describe summary: Condense the following text into a summary that is less verbose but still gives me the gist of the content. Do not describe the formatting.

check grammar: Check the grammar of the following text. Return a list of all potential errors.

check spelling: Check the spelling of the following text. Return a list of all potential errors.

check structure: I want you to skim the structure and layout of the following text. Tell me if the structure and order of my writing is correct. If it is not correct or flows poorly then tell me what might be wrong with it. If it is all correct then say it looks good. Do not describe the formatting. Do not comment on the specific content only its ordering and layout.

## TRANSLATIONS

translate spanish: Translate the following text from Spanish to English.

translate french: Translate the following text from French to English.


PS: it can also use local models:


settings():
user.llm_provider = 'OPENAI'
# user.llm_provider = "LOCAL_LLAMA"

# Change to 'gpt-4' for GPT-4
# Note, you may not have access to GPT-4 yet
# https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4
user.openai_model = 'gpt-3.5-turbo'



 
Quentin NEVEN
Quentin NEVEN  Identity Verified
Belgium
Local time: 19:54
Member (Jan 2024)
English to French
+ ...
I really like Whisper Jan 24

Hi,

I've never used Dragon but one of my university teachers showed us how it works a few years ago, and I was no really convinced.

I tried Whipser recently, and I find it really accurate, although a bit slow.

I had 0 experience in Python, or coding in general. Now, I feel like I am a pro because I managed to make it work hahaha


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:54
Member (2014)
Japanese to English
Scratch Jan 24

Cheers, useful. I often find (and I suspect you have also) that it is quicker to re-dictate something than to try to edit it. In Talon can you undo what you just wrote, sort of like using "scratch that" in Dragon?

Dan


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
yup Jan 24

Dan Lucas wrote:

Cheers, useful. I often find (and I suspect you have also) that it is quicker to re-dictate something than to try to edit it. In Talon can you undo what you just wrote, sort of like using "scratch that" in Dragon?

Dan


‘Scratch that’ works exactly like in Dragon.

Another workaround I just thought of, for example, if you're working in memoQ and you're in the target text box, is to say something like:

‘Select fourth word’, and then either just redictate it, or even better, use one of the Talon-AI-Tools commands to have ChatGPT quickly fix it. e.g. ‘model fix grammar’. btw, all these voice commands can be edited/shortened. they are all just .talon text files or Python code

talon-beijer.bot-memoQ


Dan Lucas
 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
need for speed Jan 24

Quentin NEVEN wrote:

Hi,

I've never used Dragon but one of my university teachers showed us how it works a few years ago, and I was no really convinced.

I tried Whisper recently, and I find it really accurate, although a bit slow.

I had 0 experience in Python, or coding in general. Now, I feel like I am a pro because I managed to make it work hahaha


By the way, the Whisper model in Talon is better than vanilla Whisper, since it's a hybrid model. Ryan the developer improved it by somehow combining it with his own model, to form a hybrid model, "Conformer D + Whisper". Apparently, he wasn't happy with the quality of Whisper on its own, so he augmented it with a ‘supervisory model’ or something.

supervisory-model-talon

Regarding speed, dictating in Talon (with the beta engine) seems about as fast as Dragon used to. God knows how he did it, seeing as how Dragon probably have hundreds of people working on their team and Ryan is one person. Having said that, in Talon a ton of stuff is outsourced as its plugin system is open source. When you install Talon, it can't do very much until you install the so-called ‘community’ voice command set (https://github.com/talonhub/community ).

He's currently also working on an even better engine called Wisp, but that's still in beta.


Dan Lucas
 
Christopher Schröder
Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...
Sounds interesting but... Jan 24

... is there a version of this thing that doesn't require a PhD in computer science?

Like one where you click on install and then just dictate?


neilmac
 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 18:54
Member (2009)
Dutch to English
+ ...
Installation and usage are actually pretty simple. Jan 24

Christopher Schröder wrote:

... is there a version of this thing that doesn't require a PhD in computer science?

Like one where you click on install and then just dictate?


It's really not as complicated as I may have made it look. A lot of my screenshots show the underlying code, which may have made it look like you need to be a programmer or tech wizard. To begin with, all you need to do is download the installer from the main website, which is available for Windows, macOS and Linux. Also, if you ever have a question, it usually gets answered within minutes on the Slack channel.


Christopher Schröder
 
Milan Condak
Milan Condak  Identity Verified
Local time: 19:54
English to Czech
It listens, transcribes and translates Jan 24

Christopher Schröder wrote:

... is there a version of this thing that doesn't require a PhD in computer science?

Like one where you click on install and then just dictate?


My HW: PC with Windows 11, 32 GM RAM and 10 GB Nvidia GPU

https://github.com/Dadangdut33/Speech-Translate/releases

From Assets 9

Installer.SpeechTranslate.1.3.10.CPU.Only.exe
341 MB 2024-01-07T14:33:56Z
Installer.SpeechTranslate.1.3.10.CPU.Only.exe.SHA256
66 Bytes 2024-01-07T14:40:16Z
Installer.SpeechTranslate.1.3.10.CUDA.11.8.exe
1.65 GB 2024-01-07T14:37:36Z
Installer.SpeechTranslate.1.3.10.CUDA.11.8.exe.SHA256
66 Bytes 2024-01-07T14:37:35Z
Portable.SpeechTranslate.1.3.10.CPU.Only.zip
371 MB 2024-01-07T14:37:00Z
Portable.SpeechTranslate.1.3.10.CUDA.11.8.zip
1.6 GB 2024-01-07T14:34:28Z
SpeechTranslate.exe.SHA256
64 Bytes 2024-01-07T14:33:52Z
Source code (zip)
2024-01-07T10:21:59Z
Source code (tar.gz)
----
I always choose and download the portable version. The GPU file requires CUDA, so I download the version, e.g. Portable.SpeechTranslate.1.3.10.CUDA.11.8.zip. I unzip it and I can use it. That's the whole science. I use only Large model.


Christopher Schröder
 
Pages in topic:   < [1 2 3 4 5] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is OpenAI’s Whisper better than Dragon?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »