Lilia Quotes Compilation Video Part 2

I just published a video with another 73 Lilia quotes on my YouTube-channel. It’s the same kind of video I wrote about in a previous blog post, but this time I got a lot of help from YogSo (on Discord) who tracked down most of the quotes.

YogSo gave me direct video links with timestamps for many of those quotes, and for others YogSo told me which livestreams contained which quotes. This allowed me to focus my search efforts on a limited number of videos. Extremely helpful.

Still, I did need to do some text searching, and I’d pretty much hit the limit for what I could find with my previous techniques. I explained the problems in my earlier post, but just to recap:

  • the written quote may not match what Lilia actually says verbatim
  • the transcription of the quote could be partly or completely wrong/missing

I decided to build a database with two distinct parts: A) the plain text content of each subtitle file, and B) a map that could tell me when every word in the plain text was spoken.

That way I could search the plain text for a quote and then, if I got a match, map the first matching word to the correct timestamp. But I still had to improve the text search somehow.

When the written quote differed too much from the transcription then I had no chance of finding it, but if they were only slightly different then fuzzy string matching could work. And it might have worked, but the Python package I used (fuzzywuzzy) suffered from horrible memory leaks so in the end I gave up that approach.

I didn’t want to try and implement my own fuzzy search tool so I decided to divide the quote into smaller chunks and search for each chunk. This gave me a ton of search results which I had to review manually, but it actually did sort of work.

I found 73 quotes in total, but like I said, I had a ton of help.

This time I didn’t bother using OpenShot to compile the snippets into one video, I just used ffmpeg all the way. As you can see if you watch the video, I haven’t figured out how to make nice transitions with ffmpeg yet.

The video quality’s still 720p. I’m tempted to try and boost that in the future. The issue’s probably that I’m not grabbing the highest quality source videos. (I mean, I thought I did, but here we are…)

The video with 73 Lilia quotes is on YouTube, and here’s a list of the quote ID:s in the video, and their source videos:

This post is part of the Lilia Quotes project.