Wednesday, December 7, 2022
Show HN: Port of OpenAI's Whisper model in C/C++ https://ift.tt/Kjtu3H4
Show HN: Port of OpenAI's Whisper model in C/C++ Hi HN, OpenAI recently released a model for automatic speech recognition called Whisper [0]. I decided to reimplement the inference of the model from scratch using C/C++. To achieve this I implemented a minimalistic tensor library in C and ported the high-level architecture of the model in C++. The entire code is less than 8000 lines of code and is contained in just 2 source files without any third-party dependencies. The Github project is here: https://ift.tt/XOH36j9 With this implementation I can very easily build and run the model - “make base.en” . It also allows me to run it on a wide range of devices. For example, I have provided examples of running the model on an iPhone, Raspberry Pi 4 and even in a web page via WebAssembly! The implementation runs fully on the CPU and utilizes FP16, AVX intrinsics on x86 architectures and NEON + Accelerate framework on Apple Silicon. The latter is especially efficient and I observe that the inference is about 2-3 times faster compared to the current PyTorch implementation provided by OpenAI when running it on my MacBook M1 Pro. The WASM port utilizes SIMD 128-bit intrinsics - a feature supported in some modern web browsers [1]. I am very happy with the performance that I observe on Apple Silicon devices. I didn’t expect that the Accelerate framework [2] (i.e. CBLAS) offers such a dramatic performance boost for matrix multiplications so I was very pleasantly surprised! To enable the framework in your C/C++ projects, all you have to do is add `-framework Accelerate` to your clang command-line flags. This entire exercise of implementing the Whisper model was very interesting to me and helped me understand a lot about how the transformer architecture works. I also got a lot of positive feedback from people finding and using my project. We brainstormed on a lot of interesting tools that can potentially be created with this library (such as speech-to-text plugin for Vim, RPi4 voice assistant, WASM chat bot, etc). If interested, checkout the “Examples” section and the “Show and tell” discussions for some ideas! Would love to know what you think about this project and about your experience with using the Accelerate framework in any of your projects. Cheers! [0] https://ift.tt/jp1Rom6 [1] https://ift.tt/ed6wrXg [2] https://ift.tt/WkxHtUZ https://ift.tt/XOH36j9 December 6, 2022 at 04:16PM
Share this
Trending
Label
4 TIPS FOR FINDING THE BEST RUBBER SEALS FOR YOUR APPLICATIONS
A CONTRACTOR’S GUIDE TO FREELANCING
Amazon Prime Video
FOUR REAL ESTATE MISTAKES THAT BANKRUPT LOTTERY WINNERS
FOX NEWS
Golden Globes 2020 nominations full list: The Crown gets 4 nods
Grey Group’s Anusha Shetty elevated as Chairperson & Group CEO
Hacker News
HOW CAN INDIAN TECH FIRMS BEST LOCALIZE CONTENT TO OPEN NEW GLOBAL MARKETS?
HOW TO MAKE MONEY PLAYING PUBG
HOW TO RECOVER DELETED TEXT MESSAGES ON IPHONE WITH/WITHOUT BACKUPS?
HOW TO RUN A SUCCESSFUL PROMOTIONAL EVENT
IBF disappointed by TRAI’s amendment of NTO and interconnection regulations
IFTTT
Independence Day
Marriage Story leads with 6
MAXIMIZING NETWORK TOPOLOGY THROUGH TYPES MOST APPROPRIATE TO YOUR OPERATION
MICROSOFT CLOUD CERTIFICATIONS EXPLAINED
NTO 2.0: Brace for another round of disruption
NYT
Our vision is to serve consistently and creatively: Vijay Subramaniam
Publicis India appoints Ranadeep Dasgupta as Executive Creative Director - North
RECOVER ALL YOUR DELETED DATA WITH IBEESOFT DATA RECOVERY
say broadcasters
SFMTA
SUBMITTING A PERSONAL INJURY CLAIM: 5 EXPERT TIPS
THE ART OF MANAGING SMALL PROJECTS
WHAT STEPS ARE NEEDED TO START AN ONLINE BUSINESS
Where to watch Golden Globes 2020 live in India?