One of the ways we price our software is by the hour – You send me an hour of audio files, I’ll put them into a “JumpTo” format, and then charge you for an hour. Job done.
So, every day, I get asked a variation on this question “I have xxGb of voice data, how much?” or if I am very lucky “I have xxGb of voice data, and yy files, how much?”
Audio files come in all shapes and sizes. You can have the same length audio file, and yet depending on the compression rate, it can vary wildly in size. A 5 minute audio file? Anywhere between 400Mb (honest, see here) down to 0.5Mb for a heavily compressed GSM format file.
I often seem to cause offence when I ask people how long their audio job is. So I try to explain it: It is like saying to me “I have 75 blue snakes: how long are they all?” – The fact is, I know what a snake is, and I know that there are big ones and little ones. But telling me they are blue doesn’t really help me in terms of working out the average length. If you told me they were all Boa Constrictors that might help, but even then, there are baby ones, and adult ones etc etc.
Having gone round this loop, I am then asked “Why don’t you charge by the Gb, like other data sources?” – A good question, and an easy one to answer.
The higher the quality of the input file, the better the results. A stereo 16Khz uncompressed wav file is much better than an 8Khz mono wav file encoded using GSM. You will get better quality text, period. But if I told you that on a per Gb basis one would be 10x expensive than the other, you might just be tempted to try to scalp a little here and there, sacrificing quality for a smaller file size.
The fact is, we have to stick to our guns. This whole speech-to-text business is very new in the litigation space, and so it will take a while before people understand the charging mechanism, and why we think it is so important. Until then, I may have to repeat my blue snake story a few more times still…