A few weeks ago, I stumbled across a number of health influencers in Germany giving tips on what organic food to eat or what to do to feel better throughout the day.
Unfortunately, while I want to do more for my health, I don't have the time to watch every video and write down all the advice they give.
So why not use AI to do the job?
I came up with the idea of automating the process of watching the videos and taking notes.
In this newsletter post I will show you how to do this using Python.
We will build a CLI application with typer that takes a TikTok username, downloads and transcribes the audio of the videos using Groq and Whisper, and then extracts the advice mentioned in the video using Groq and LLM models.
GitHub Repo: https://github.com/baniasbaabe/short-video-wisdom
Workflow
Our process is as follows:
Download audios of the videos from a selected TikTok account. I chose audios because they are easier to download and we don't need visuals for our project
Send audio to Groq, where we will use Whisper to transcribe it
Get the important tips from the transcript, powered by Groq and Llama
Save the results to markdown
Let’s go!
Set Up
Make sure to register for Groq and generate your API key. Luckily, Groq has a generous free tier to use various kind of AI models.
Install the necessary dependencies
groq==0.18.0
tiktokapi==7.0.0
typer==0.15.2
We need Groq for interacting with the AI models.
We need typer for building our CLI app. typer is the modern way of building CLI apps.
We need tiktokapi to interact with TikTok. Of course, it is not the most reliable API since it uses playwright under the hood. So, no guarantee that everything will work all the time.
Get your MS Token. I mentioned that the tiktokapi uses playwright. Since tiktok detects you easily as a bot, we need to grab a token to tell TikTok that we are safe. For that, go to TikTok, go to your Developer Tools of your browser and go to the session storage to see all cookies and stuff. It should look like this:
Create the entrypoint of our app
In order to let the user interact with the CLI, we need to create the main function.
Let’s go step by step:
We need to create a Typer instance where we can add our command to it
To tell typer that we have a command (function) wich can be run by the user, we need to decorate it with app.command()
We have optional and mandatory inputs we will take from the user
ms_token, groq_key, and username are mandatory. Therefore they don’t have any default value.
output_dir (directory, where the videos should be saved), language (the language of the videos for whisper to know), video_limit (how many videos to download from tiktok), browser (the browser used by tiktokapi), and headless (if the tiktok scraping should be headless) are optional.
We set the Groq API key as environment variable and initialize an instance
We create the directory where videos should be saved
typer.echo just prints something into the terminal
We will call our function to download videos (shown later)
We will process the videos (shown later)
Now, try to run the app in your terminal:
$ python app.py
And you will immediately get an error:
Error: Missing option '--ms-token'.
Like intended :D
If you want to provide arguments, just run it like any other script
python app.py --ms_token "BLABLA" --groq_key "BLABLA" --username "BLABLA"
This will obviously not work since we need to add a few functions.
Download TikTok Videos
Here is how we downlaod tiktok videos with tiktokapi:
Create the directory for saving the videos
Create a TikTok session
Go through the videos of the TikTok creator
The TikTok video.as_dict gives you so many information. I already went through the hassle of finding what we need. In the playUrl field, we find an URL which points to an audio file of the video. Since we will just transcribe the video, we don’t need any visuals so this is sufficient.
Save the audio of the video
Transcribe the Videos
Now we have downloaded the audios of the TikTok videos.
We need to transcribe them to know what they are talking about in their videos.
Grab all video paths
Create a markdown file to save our results later
Transcribe the audio with Groq and Whisper
Generate a summary with an LLM (shown later)
Append the generated summary to the markdown file
Generate Summary
Here, we will use llama-3.2-3b-preview, but you are free to choose any LLM
Our prompt (not perfect though) to tell the LLM we just want the quintessence of the transcript
Run the CLI app
Since we have all the components we need, let’s run our app.
I will choose a german health influencer (boran_nr1) but feel free to choose whoever you want.
Run the app:
python app.py --ms-token "REPLACEME" --groq-key "REPLACEME" --username "boran_nr1"
Now, check your video_summaries.md
Mine looks something like:
# Video Summaries
- Eggs can only be proven to contain synthetic vitamins such as vitamin B12 cyanocobalamin.
- Natural forms of vitamin B12 such as hydroxycobalamin and adenosylcobalamin cannot be present.
- Liver contains natural forms of vitamin B12 and not man-made versions.
---
- Regular sauna use reduces the risk of cardiovascular disease by 50%.
- Dementia and Alzheimer's are reduced by 66%.
- Saunas improve the skin.
- Saunas remove toxins from the body.
- Taking a sauna boosts the immune system.
- Light sauna sessions can also be taken during the cold season.
- A cold shower after a sauna session is crucial.
---
- A healthy diet that works is not about wasting money on products that have no place in our bodies.
- Simply eating nutritious foods can lead to a strong and healthy body.
- Look for foods with the following nutrients:
* Melatonin: kiwi, dates, bee bread
* Frozen exercise: Blueberries, iron lovelies
* Alkalising vegetables: broccoli
* Sweet potatoes with olive oil and rock salt
* Avocado (source of fat, testosterone-boosting vegetable)
* Eggs (most bioavailable protein)
... (and so on)
Nice!! Instead of mindlessly scrolling through all videos and take notes, we leveraged Whisper and LLama to do the work.
Conclusion
You learned how you can make your life easier by summarizing TikTok videos by AI in under ~150 lines of code.
Room for improvement is always there. You can try out more capable models for transcribing and summarizing. And you can adjust the prompt to reduce the possibility to output some unnecessary information.