Introduction
Recently I saw a video in Instagram reels about a guy that made quite a popular account with just copying
reddit posts, transcribing it using AI, adding subtitles and pasting it over a subway surfers or gta 5
mega ramp
video. Basically something like this instagram
account.
He explained how he did it manually, going onto Reddit, copying the text of a post, adding it to CapCut
to add
TTS, transcribe and add subtitles, and then manually post it on instagram (or TikTok). I saw this and I
thought:
this can easily be automated
. So I got to work. I used Python for obvious reasons.
This will be a full walkthrough of the code, how it works and what the results look like. I've divided all parts in different modules that will be connected by the main Python file, so it's easy to change out a module for something else (like using a different TTS engine or something).
Getting a Reddit post
Step 1 is to get the text of a post from Reddit so it can be processed into a video. We want the video to read the title and the body text of a post, so we can make a post into a class. A reddit post contains these elements:
- Post id, unique for each post in their subreddit
- Post title
- Post description or the text
- Comments
- Subreddit
class RedditPost:
def __init__(self, id,title, description, comments, subreddit) -> None:
self.id = id
self.title = title
self.description = description
self.comments = comments
self.subreddit = subreddit
self.sanitize_post()
def into_text(self) -> str:
return self.title + ".\n" + self.description
def __str__(self) -> str:
return "Id: " + self.id + "Title: " + self.title + "\nDescription: " + self.description + "\nComments: " + str(len(self.comments))
The into_text
method will be used to get the text of the post and convert it into speech.
Sometimes reddit posts can contain text that is hard to turn into speech, and we want to censor "bad"
words because otherwise IG won't push the video as much. We also want to remove any extra periods
because they will also be spoken literally by the TTS AI. I added some (crude) sanitization to the
RedditPost
class to prevent this:
def sanitize_post(self):
self.description = self.description.replace("LGBTQ","L G B T Q")
self.description = self.description.replace("+","plus")
self.description = self.description.replace("/"," slash ")
self.description = self.description.replace("TLDR","To summarize: ")
self.description = self.description.replace("&", "and")
self.description = self.description.replace("ä", "ae")
self.description = self.description.replace("ö", "oe")
self.description = self.description.replace("ü", "ue")
self.description = self.description.replace("ß", "ss")
self.description = self.description.replace("*","")
self.description = self.description.replace("_","")
self.description = self.description.replace('"'," ")
# profanities
self.description = self.description.replace("fuck", "frick")
self.description = self.description.replace("Fuck", "Frick")
self.description = self.description.replace("Shit", "Shot")
self.description = self.description.replace("shit", "shot")
self.description = self.description.replace(" ass", " butt")
self.description = self.description.replace("asshole", "a-hole")
self.description = self.description.replace(" Ass", " Butt")
self.description = self.description.replace("Asshole", "A-hole")
self.description = self.description.replace(" buttum", " assum") # Assume also contains "Ass"
self.description = self.description.replace(" Buttum", " Assum")
self.description = self.description.replace("kill", "unalive")
self.description = self.description.replace("Kill", "Unalive")
self.description = self.description.replace("death", "unalive")
self.description = self.description.replace("Death", "Unalive")
self.description = self.description.replace("murder", "unalive")
self.description = self.description.replace("Murder", "Unalive")
self.description = self.description.replace("suicide", "self unalive")
self.description = self.description.replace("Suicide", "Self unalive")
self.description = self.description.replace("pedofile", "pdf ile")
self.description = self.description.replace("Pedofile", "Pdf ile")
self.description = self.description.replace("sex", "s*x")
self.description = self.description.replace("Sex", "s*x")
self.title = self.title.replace("fuck", "frick")
self.title = self.title.replace("Fuck", "Frick")
self.title = self.title.replace("Shit", "Shot")
self.title = self.title.replace("shit", "shot")
# AmITheAsshole
self.description = self.description.replace("AITA","Am I the a-hole")
self.title = self.title.replace("AITA","Am I the a-hole")
# tifu
self.description = self.description.replace("TIFU","Today I fricked up")
self.title = self.title.replace("TIFU","Today I fricked up")
# lifeProTips
self.description = self.description.replace("LPT","Life pro tip")
self.title = self.title.replace("LPT","Life pro tip")
self.description = stringutils.remove_trailing_periods(self.description)
The stringutils
module contains some functionality for processing text:
stringutils.py
import logging
logger = logging.getLogger(__name__)
alphabet = "qwertyuiopasdfghjklzxcvbnm"
def remove_trailing_periods(text: str) -> str:
for i in range(len(text)):
if (i < len(text)-1) and text[i].lower() not in alphabet and text[i+1] == ".":
# remove the period after this one
text = text[:i+1] + text[i + 2:]
logger.info("removing extra period at index " + str(i+1))
return remove_trailing_periods(text)
return text
def remove_period_after(character: str, text: str) -> str:
for i in range(len(text)):
if text[i] == character and text[i+1] == ".":
# remove the period after this one
text = text[:i+1] + text[i + 2:]
logger.info("removing extra period at index " + i+1)
return remove_period_after(character,text)
return text
def remove_repeating_periods(text: str) -> str:
return remove_period_after(".",text)
Now that we have a class we can use to represent a Reddit post, we need to actually retrieve them. For
this, I used the PRAW python package. I put the
functionality for this into a RedditEngine
class:
class RedditEngine:
MAX_IG_SHORT_LENGTH = 1620 # max video length is 1:30, this is about that
REDDIT_IDS_FILENAME = "reddit_ids"
TTS_FOLDER_NAME = "tts"
SUBREDDITS_STORIES_FILENAME = "subreddits_stories"
DEFAULT_POST_AMOUNT = 30
def __init__(self) -> None:
clientid = "your reddit client id"
secret = "your reddit secret"
user_agent = "praw_scaper_1.0"
self.reddit = praw.Reddit(username='your username',password='your password',client_id=clientid,client_secret=secret,user_agent=user_agent)
self.posts :List[RedditPost] = []
self.already_used_ids = []
with open(RedditEngine.REDDIT_IDS_FILENAME,"r") as reddit_ids:
for line in reddit_ids:
self.already_used_ids.append(line.replace("\n","").strip())
logger.info("IDs already used:")
logger.info(self.already_used_ids)
def get_posts(self, subreddit_name, limit):
subreddit = self.reddit.subreddit(subreddit_name)
logger.info("getting hot " + str(limit) + " posts for subreddit: " + subreddit.display_name)
for submission in subreddit.hot(limit=limit):
if RedditEngine.check_post(subreddit_name, submission) and ((len(submission.title) + len(submission.selftext)) < RedditEngine.MAX_IG_SHORT_LENGTH):
self.posts.append(RedditPost(str(submission),submission.title, submission.selftext, submission.comments, subreddit_name))
def check_post(subreddit_name, submission):
if "UPDATE" in submission.title or "(Part" in submission.title:
return False
if subreddit_name == "AmITheAsshole" and "Monthly Open" in submission.title:
return False
elif subreddit_name == "talesfromtechsupport" and "POSTING RULES" in submission.title or "Mr_Cartographer" in submission.title or "(Part" in submission.title or str(submission) == "16u1gxn":
return False
return True
def choose_id(self, id: str) -> bool:
"""
Checks if the post with the given ID is in the already used ids or not
Parameters:
id: id of the post to check
Returns:
True if the post has not yet been used, false otherwise
"""
return id not in self.already_used_ids
def exclude_id(self, id: str):
"""
Adds the ID to the already used ids file.
"""
self.already_used_ids.append(id)
with open (RedditEngine.REDDIT_IDS_FILENAME, "a") as f:
f.write(id + "\n")
Let's break that down . I want to always
get the 30 hot posts for a specific subreddit, but I don't want to use the same post twice. That's where
the
REDDIT_IDS_FILENAME
comes in. It's a file that every ID of a used post will get written to.
Each line will contain an ID of a post that has already been used. For example:
1eig16p
1ei8uz8
1eiqnyi
1ehi5il
1ejghtc
1ejjj51
The TTS_FOLDER_NAME
will be used by the main script to save the generated text-to-speech
files to. The
SUBREDDITS_STORIES_FILENAME
is a file that contains the names for all subreddits that can
be used to get a post from. This is also used by the main script. It looks like this:
tifu
nosleep
relationships
LifeProTips
pettyrevenge
talesfromtechsupport
confessions
AmITheAsshole
TrueOffMyChest
To be able to scrape reddit for posts, you need to give PRAW access to your account by entering a client ID and secret. To do that, you need to create an app and get the client id and secret from it.
In the constructor, PRAW will get initialized and the already used posts are read into a list. The
get_posts
method gets the hot 30 posts for a subreddit so that one can be chosen. The
check_post
method will check if a post is not an announcement or update post, because we
only want stand-alone posts to make a video out of. The choose_id
method will check if the
given ID is not already used, and the exclude_id
method will add a post ID to the already
used IDs list.
In the script that generates one video, a random subreddit is chosen from the file and the hot 30 posts
for that are gathered. From those posts, the first one that is not yet in the list of used posts gets
chosen. This is visible in the auto_post_video
and
generate_video_for_subreddit
functions:
def generate_video_for_subreddit(subreddit: str, reddit_engine: get_reddit_posts.RedditEngine) -> bool:
reddit_engine.get_posts(subreddit, get_reddit_posts.RedditEngine.DEFAULT_POST_AMOUNT)
id_accepted = False
i = 0
post = None
while not id_accepted:
if i == len(reddit_engine.posts):
return False
post = reddit_engine.posts[i]
can_use_post = reddit_engine.choose_id(post.id)
if(can_use_post):
id_accepted = True
else:
i+= 1
generate_story_video_for_post(post,reddit_engine)
return True
def auto_post_video():
reddit_engine = get_reddit_posts.RedditEngine()
subreddits = []
with open(get_reddit_posts.RedditEngine.SUBREDDITS_STORIES_FILENAME, "r") as f:
subreddits = f.readlines()
subreddit = random.choice(subreddits)
logger.info("getting post from subreddit " + subreddit)
video_result = generate_video_for_subreddit(subreddit,reddit_engine)
if (not video_result):
logger.warning("should use another subreddit")
After having chosen a Reddit post, it is further processed into a video.
Converting the text to speech
After a Reddit post is chosen, the next step is to convert the text of the post into speech. I wanted to do this using AI because it's incredibly easy to use nowadays. The first thing I tried was ElevenLabs. The results it generates are great, but unfortunately there's a character limit, and I'm not gonna pay for any of this.
text_to_speech_elevenlabs.py
import requests
import random
import logging
logger = logging.getLogger(__name__)
class ElevenLabsVoice:
def __init__(self, voice_id, name):
self.voice_id = voice_id
self.name = name
def __str__(self) -> str:
return "Voice ID: " + self.voice_id + ", Name: " + self.name
class ElevenLabsTTS:
API_KEY = "your API key"
CHUNK_SIZE = 1024
def __init__(self, api_key):
self.api_key = api_key
self.all_voices = []
self.current_voice = None
def get_all_voices(self):
logger.info("retrieving all voices...")
url = "https://api.elevenlabs.io/v1/voices"
headers = {
"Accept": "application/json",
"xi-api-key": self.api_key
}
response = requests.get(url, headers=headers)
for voice in response.json()["voices"]:
self.all_voices.append(ElevenLabsVoice(voice["voice_id"], voice["name"]))
def select_random_voice(self):
self.current_voice = random.choice(self.all_voices)
logger.info("Selected random voice: " + self.current_voice.name)
def write_to_file(self,filename,text) -> bool:
if self.current_voice is None:
raise Exception("No voice selected")
logger.info("writing text to file " + filename + "...")
logger.info(text)
logger.info("using voice: " + self.current_voice.name)
url = "https://api.elevenlabs.io/v1/text-to-speech/" + self.current_voice.voice_id
headers = {
"Accept": "audio/mpeg",
"Content-Type": "application/json",
"xi-api-key": ElevenLabsTTS.API_KEY
}
data = {
"text": text,
"voice_settings": {
"stability": 0.3,
"similarity_boost": 0.5
}}
response = requests.post(url, json=data, headers=headers)
logger.info("GOT RESPONSE")
logger.info(response)
logger.info(response.headers)
logger.info(response.text)
if (response.status_code != 200):
return False
with open(filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=ElevenLabsTTS.CHUNK_SIZE):
if chunk:
f.write(chunk)
logger.info("Done writing to file!")
return True
The next thing I tried was running a TTS AI model locally on the VM that will upload these videos. I looked at Coqui: a language model toolkit that's pretty easy to use. It worked pretty well and I got it working fairly quickly, but I wasn't satisfied with the results.
text_to_speech_coqui_tts.py
import torch
from TTS.api import TTS
from pydub import AudioSegment
import os
import stringutils
import time
import logging
logger = logging.getLogger(__name__)
class CoquiTTSEngine:
def __init__(self):
model_name = "tts_models/en/ljspeech/fast_pitch"
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.tts = TTS(model_name=model_name, progress_bar=True).to(self.device)
def synthesize_speech(self, text: str, file_path: str) -> bool:
logger.info(" >>>>> Synthesizing text\n" + text + "\n >>>>> to file " + file_path)
new_text = text.replace("\\","").replace("*","")
new_text = stringutils.remove_trailing_periods(new_text)
logger.info("text after processing a little: " + new_text)
tmp_file = "tmp_audio.mp3"
logger.info("Synthesizing speech...")
try:
self.tts.tts_to_file(text=new_text, file_path=tmp_file)
except Exception as e:
logger.error(e)
return False
time.sleep(1) # wait a little before reading the file
logger.info("speeding up audio file...")
orig_file = AudioSegment.from_file(tmp_file)
sped_up_file = orig_file.speedup(1.3)
sped_up_file.export(file_path,format="mp3")
os.remove(tmp_file)
return True
After some more searching, I came across the TikTok TTS API. I didn't know it existed, and since it's used by almost all reels and TikToks that use AI TTS, it was the perfect choice. It also does not have a character limit as far as I know. There's multiple voices to choose from, so I made the script choose a random english one every time a video gets made.
import sys
sys.path.append("TikTok-Voice-TTS")
from tiktokvoice import tts
import random
import logging
logger = logging.getLogger(__name__)
voices_en = [
# ENGLISH VOICES
'en_au_001', # English AU - Female
'en_au_002', # English AU - Male
'en_uk_001', # English UK - Male 1
'en_uk_003', # English UK - Male 2
'en_us_001', # English US - Female (Int. 1)
'en_us_002', # English US - Female (Int. 2)
'en_us_006', # English US - Male 1
'en_us_007', # English US - Male 2
'en_us_009', # English US - Male 3
'en_us_010', # English US - Male 4
]
class TiktokTTSApi:
def choose_random_voice() -> str:
chosen_voice = random.choice(voices_en)
logger.info("choosing random tiktok voice " + chosen_voice)
return chosen_voice
def tts(self, text: str, filename: str) -> str:
logger.info("converting text to speech!")
voice = TiktokTTSApi.choose_random_voice()
tts(text, voice, filename)
return voice
This is then used in the script to generate a single video:
def generate_story_video_for_post(post: get_reddit_posts.RedditPost, reddit_engine: get_reddit_posts.RedditEngine):
mp3_filename = post.id + ".mp3"
reddit_id_tts_file = os.path.join(os.getcwd(),get_reddit_posts.RedditEngine.TTS_FOLDER_NAME, mp3_filename)
tiktok_tts_api = text_to_speech_tiktok_api.TiktokTTSApi()
voice = tiktok_tts_api.tts(post.into_text(),reddit_id_tts_file)
...
Transcribing
After generating a TTS mp3 file for a Reddit post, the next step is to transcribe the spoken text, so we know when each word will be spoken. This will tell us when we need to show which word onto the screen. To do this, we can use another AI called Whisper. It's made by OpenAI (from ChatGPT, duh) and it works very well. It's also free to use and you can run it locally by downloading the model yourself. It can be used as a command line tool or as a Python package, perfect for this use case.
Using it in python is very straightforward. You load the model you want, pass in the filename of an mp3 file you want to transcribe and Bob's your uncle🥳. I put the functionality into a class so it stays modular:
whisper_transcribe.py
import whisper
import logging
logger = logging.getLogger(__name__)
class WhisperTranscriber:
def __init__(self) -> None:
logger.info("loading whisper model base.en")
self.model = whisper.load_model("base.en") # english-only base model
self.text_array = []
self.fps = 0
def transcribe(self, audio_filename: str) -> dict:
logger.info("transcribing " + audio_filename)
return self.model.transcribe(audio_filename,fp16=False,word_timestamps=True) # using CPU, FP32 must be used
Note that, because I run this on a VM (and I don't have GPU passthrough set up for this VM), I need to
use the fp16=False
parameter, to force FP32. The parameter
word_timestamps=True
is also very useful, as it will give us the timestamp for each word,
rather than for each sentence. This will come in later when we create the actual video.
After having made the class, it can be added to the method to generate a video for a story:
def generate_story_video_for_post(post: get_reddit_posts.RedditPost, reddit_engine: get_reddit_posts.RedditEngine):
...
transcriber = whisper_transcribe.WhisperTranscriber()
result = transcriber.transcribe(reddit_id_tts_file)
...
Generating the video
After transcribing, it's time to do some video editing. The difficult part is figuring out when to
display what part of a sentence. Luckily, we have the timestamps of each word thanks to that handy-dandy
word_timestamps
parameter from Whisper. I found a video that explains a bit about how to go
about creating subtitles with moviepy, but I didn't
really like this guy's implementation, so I modified it a bit.
We begin with (of course) a Video class📽️:
class Video:
def __init__(self,filename,width,height,duration, fps = 0, clip = None) -> None:
self.filename = filename
self.width = width
self.height = height
self.duration = duration
self.fps = fps
self.transcribed_text = []
self.clip = clip
It contains a filename to which to save it, the size of the video, the duration in seconds, FPS, a list of the sentences that were transcribed and a reference to a moviepy clip. The first part of creating the video is to crop it to the correct aspect ratio for instagram, remove the original audio and the TTS audio. In the process of making a video, these things happen:
- Create a video and audio clip
- crop the video to a 16:9 aspect ratio
- select a random start time for the video
- clip it to the length of the TTS audio
add_audio
method:
def add_audio(video_path, audio_path, output_path) -> Video:
logger.info("adding audio file " + audio_path + " to video file " + video_path + " and saving to " + output_path)
video = mpe.VideoFileClip(video_path)
audio = mpe.AudioFileClip(audio_path)
# calculate width to make video 9:16 aspect ratio
W,H = video.size
new_width = (float(H)/16.0)*9.0
new_width_start = (float(W)/2.0) - new_width/2.0
new_width_end = new_width_start + new_width
logger.info("Width of original video is " + str(W) + ". Setting width to " + str(new_width))
logger.info("cropping width from " + str(new_width_start) + " to " + str(new_width_end))
# make video as long as the audio
audio_duration = audio.duration # duration in seconds
video_duration = video.duration
logger.info("audio is " + str(audio.duration) + " seconds, video is " + str(video.duration) + " seconds")
start = random.randrange(0,int(video_duration-audio_duration)) # random start point in video
logger.info("clipping video from " + str(start) + " seconds to " + str(start + audio_duration))
clip = video.subclip(start, start + audio_duration).without_audio().set_audio(audio)
cropped_clip = moviepy.video.fx.all.crop(clip,x1=new_width_start,width=new_width)
if cropped_clip.fps > 60:
cropped_clip.set_fps(60)
logger.info("FPS IS " + str(cropped_clip.fps))
return Video(output_path,new_width,H,audio_duration,cropped_clip.fps,cropped_clip)
This returns a Video
object that is further used to add the subtitles for the transcribed text. To represent a transcribed line of text, I made a TranscribedLineInfo
class. This represents a line with a duration (either in seconds or frames):
class TranscribedLineInfo:
def __init__(self,line: str, fps: float, in_seconds: bool, start_frame: int = 0, end_frame: int = 0, start_second = 0, end_second = 0) -> None:
self.text = line
self.start_second = 0
self.end_second = 0
self.start_frame = start_frame
self.end_frame = end_frame
if in_seconds:
self.start_second = start_second
self.end_second = end_second
else:
self.start_second = start_frame/fps
self.end_second = end_frame/fps
self.duration = (self.end_second - self.start_second)