Read, Don't Watch: A YouTube Transcript Tool

in Synergy Builders14 hours ago

Hey everyone,

I have a confession to make: I often don't have the time or patience to sit through a whole YouTube video. I can read much faster than most people can talk, and I just want to get to the point. If you're like me, you've probably wished for an easy way to just grab the text from a video and skim it.

Just what it says...

That's why I created ytt.py, a simple command-line tool that does exactly that. It fetches the available transcript for any YouTube video and prints it right to your terminal.

How It Works

The script is pretty straightforward. It's built on top of the excellent youtube-transcript-api Python library. The main logic I added was a robust function to extract the video ID from all the various, weird YouTube URL formats you might encounter, from standard watch?v= links to shortened youtu.be links and even embed URLs.

Once it has the video ID, it simply requests the transcript from the API (defaulting to English, but you can specify other languages) and prints the clean, plain-text version.

The Code

The script is self-contained and easy to use. Here's a look at the core logic for extracting the video ID and fetching the transcript:

def extract_video_id(url_or_id: str) -> str:
    """Extract YouTube video ID from URL or return ID if already provided."""
    # Check if it's already a valid video ID format
    if re.match(r"^[a-zA-Z0-9_-]{11}$", url_or_id):
        return url_or_id

    try:
        parsed = urlparse(url_or_id)
        # ... logic to handle various youtube domains and URL patterns ...
        # Standard watch URL format
        query_params = parse_qs(parsed.query)
        video_id = query_params.get("v", [None])[0]

        if not video_id or not re.match(r"^[a-zA-Z0-9_-]{11}$", video_id):
            raise ValueError(f"Invalid video ID format in URL: {url_or_id}")

        return video_id
    except Exception as e:
        raise ValueError(f"Invalid YouTube URL or video ID: {url_or_id}") from e


async def fetch_youtube_transcript(video_id: str, lang: str = "en") -> str:
    """Get YouTube transcript for a video ID."""
    try:
        transcript = YouTubeTranscriptApi().fetch(video_id, languages=[lang])
        formatter = TextFormatter()
        return formatter.format_transcript(transcript)
    except Exception as e:
        logger.error(f"Failed to get transcript for video {video_id}: {e}")
        raise

Get The Tool

You can grab the full script from the Gist I created for it. It includes instructions on how to install the single dependency it needs to run.

It's a simple tool, but it's one I use all the time. Hopefully, some of you will find it just as useful for saving time and getting information more efficiently.

As always,
Michael Garcia a.k.a. TheCrazyGM

Sort:  

yt-dlp has built in support for downloading transcripts with or without timestamps, but I did the same route you did as yt-dlp doesn't like vpns but YouTube Transcripts API just works.

I still use yt-dlp for things like grabbing just the audio and of course grabbing the video itself (but not as often I would have thought)

Now that's a cool tool! I'm usually very much the same way, and I often don't have the time to watch long YouTube videos, like all the awesome interviews with fellow Hivers, on which I'm massively backlogged...lol! 😁 🙏 💚 ✨ 🤙

[@PowerPaul:]

Hey buddy. Greetings! Because of your participation in the CryptoCompany community you received a vote from @CryptoCompany and its trail! Thank you for your participation in the "Banner for Boost" campaign.
Hive a great day!

Please don't vote on this comment as a thank you, because this comment is not really POB. But if you like to thank me for my service and support development of more services, please think about a small HP delegation to @powerpaul. Thank you for that!


!LOLZ

Why did the chicken climb on top of the house?
He was a Roofster.

Credit: reddit
@thecrazygm, I sent you an $LOLZ on behalf of ccceo.voter

(1/8)
Delegate Hive Tokens to Farm $LOLZ and earn 110% Rewards. Learn more.

Amazing little tool! And beyond ease of use, I could see piping this into a further usecase, we should totally add to the project builder!

!PAKX
!PIMP
!PIZZA

View or trade PAKX tokens.

@ecoinstant, PAKX has voted the post by @thecrazygm. (1/2 calls)

Use !PAKX command if you hold enough balance to call for a @pakx vote on worthy posts! More details available on PAKX Blog.

PIZZA!

$PIZZA slices delivered:
@ecoinstant(2/20) tipped @thecrazygm

Come get MOONed!