Scraper – A Highly effective Python Script That Permits You To Scrape Messages And Media From Telegram Channels Utilizing The Telethon Library

April 11, 2025

10

A robust Python script that permits you to scrape messages and media from Telegram channels utilizing the Telethon library. Options embrace real-time steady scraping, media downloading, and knowledge export capabilities.

___________________  _________
__    ___/  _____/ /   _____/
|    | /     ___ _____   
|    |     _  /        
|____|  ______  /_______  /
/        /

Options 🚀

Scrape messages from a number of Telegram channels
Obtain media recordsdata (images, paperwork)
Actual-time steady scraping
Export knowledge to JSON and CSV codecs
SQLite database storage
Resume functionality (saves progress)
Media reprocessing for failed downloads
Progress monitoring
Interactive menu interface

Stipulations 📋

Earlier than operating the script, you may want:

Python 3.7 or increased
Telegram account
API credentials from Telegram

Required Python packages

pip set up -r necessities.txt

Contents of necessities.txt:

telethon
aiohttp
asyncio

Getting Telegram API Credentials 🔑

Go to https://my.telegram.org/auth
Log in together with your cellphone quantity
Click on on “API improvement instruments”
Fill within the type:
App title: Your app title
Brief title: Your app quick title
Platform: Could be left as “Desktop”
Description: Transient description of your app
Click on “Create software”
You will obtain:
api_id: A quantity
api_hash: A string of letters and numbers

Preserve these credentials secure, you may want them to run the script!

Setup and Working 🔧

Clone the repository:

git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper

Set up necessities:

pip set up -r necessities.txt

Run the script:

python telegram-scraper.py

On first run, you may be prompted to enter:
Your API ID
Your API Hash
Your cellphone quantity (with nation code)
Your cellphone quantity (with nation code) or bot, however use the cellphone quantity choice when prompted second time.
Verification code (despatched to your Telegram)

Preliminary Scraping Habits 🕒

When scraping a channel for the primary time, please be aware:

The script will try and retrieve all the channel historical past, ranging from the oldest messages
Preliminary scraping can take a number of minutes and even hours, relying on:
The overall variety of messages within the channel
Whether or not media downloading is enabled
The scale and variety of media recordsdata
Your web connection velocity
Telegram’s price limiting
The script makes use of pagination and maintains state, so if interrupted, it could resume from the place it left off
Progress share is displayed in real-time to trace the scraping standing
Messages are saved within the database as they’re scraped, so you can begin analyzing accessible knowledge even earlier than the scraping is full

Utilization 📝

The script gives an interactive menu with the next choices:

[A] Add new channel
Enter the channel ID or channelname
[R] Take away channel
Take away a channel from scraping record
[S] Scrape all channels
One-time scraping of all configured channels
[M] Toggle media scraping
Allow/disable downloading of media recordsdata
[C] Steady scraping
Actual-time monitoring of channels for brand new messages
[E] Export knowledge
Export to JSON and CSV codecs
[V] View saved channels
Listing all saved channels
[L] Listing account channels
Listing all channels with ID:s for account
[Q] Stop

Channel IDs 📢

You should utilize both: – Channel username (e.g., channelname) – Channel ID (e.g., -1001234567890)

Information Storage 💾

Database Construction

Information is saved in SQLite databases, one per channel: – Location: ./channelname/channelname.db – Desk: messages – id: Major key – message_id: Telegram message ID – date: Message timestamp – sender_id: Sender’s Telegram ID – first_name: Sender’s first title – last_name: Sender’s final title – username: Sender’s username – message: Message textual content – media_type: Kind of media (if any) – media_path: Native path to downloaded media – reply_to: ID of replied message (if any)

Media Storage 📁

Media recordsdata are saved in: – Location: ./channelname/media/ – Recordsdata are named utilizing message ID or unique filename

Exported Information 📊

Information might be exported in two codecs: 1. CSV: ./channelname/channelname.csv – Human-readable spreadsheet format – Simple to import into Excel/Google Sheets

JSON: ./channelname/channelname.json
Structured knowledge format
Ideally suited for programmatic processing

Options in Element 🔍

Steady Scraping

The continual scraping characteristic ([C] choice) permits you to: – Monitor channels in real-time – Mechanically obtain new messages – Obtain media because it’s posted – Run indefinitely till interrupted (Ctrl+C) – Maintains state between runs

Media Dealing with

The script can obtain: – Images – Paperwork – Different media varieties supported by Telegram – Mechanically retries failed downloads – Skips present recordsdata to keep away from duplicates

Error Dealing with 🛠️

The script consists of: – Automated retry mechanism for failed media downloads – State preservation in case of interruption – Flood management compliance – Error logging for failed operations

Limitations ⚠️

Respects Telegram’s price limits
Can solely entry public channels or channels you are a member of
Media obtain dimension limits apply as per Telegram’s restrictions

Contributing 🤝

Contributions are welcome! Please be at liberty to submit a Pull Request.

License 📄

This challenge is licensed beneath the MIT License – see the LICENSE file for particulars.

Disclaimer ⚖️

This software is for instructional functions solely. Be certain to: – Respect Telegram’s Phrases of Service – Acquire needed permissions earlier than scraping – Use responsibly and ethically – Adjust to knowledge safety laws