Wednesday, April 16, 2025

Scraper – A Highly effective Python Script That Permits You To Scrape Messages And Media From Telegram Channels Utilizing The Telethon Library




A robust Python script that permits you to scrape messages and media from Telegram channels utilizing the Telethon library. Options embrace real-time steady scraping, media downloading, and knowledge export capabilities.

___________________  _________
__ ___/ _____/ / _____/
| | / ___ _____
| | _ /
|____| ______ /_______ /
/ /

Options 🚀

  • Scrape messages from a number of Telegram channels
  • Obtain media recordsdata (images, paperwork)
  • Actual-time steady scraping
  • Export knowledge to JSON and CSV codecs
  • SQLite database storage
  • Resume functionality (saves progress)
  • Media reprocessing for failed downloads
  • Progress monitoring
  • Interactive menu interface

Stipulations 📋

Earlier than operating the script, you may want:

  • Python 3.7 or increased
  • Telegram account
  • API credentials from Telegram

Required Python packages

pip set up -r necessities.txt

Contents of necessities.txt:

telethon
aiohttp
asyncio

Getting Telegram API Credentials 🔑

  1. Go to https://my.telegram.org/auth
  2. Log in together with your cellphone quantity
  3. Click on on “API improvement instruments”
  4. Fill within the type:
  5. App title: Your app title
  6. Brief title: Your app quick title
  7. Platform: Could be left as “Desktop”
  8. Description: Transient description of your app
  9. Click on “Create software”
  10. You will obtain:
  11. api_id: A quantity
  12. api_hash: A string of letters and numbers

Preserve these credentials secure, you may want them to run the script!

Setup and Working 🔧

  1. Clone the repository:
git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper
  1. Set up necessities:
pip set up -r necessities.txt
  1. Run the script:
python telegram-scraper.py
  1. On first run, you may be prompted to enter:
  2. Your API ID
  3. Your API Hash
  4. Your cellphone quantity (with nation code)
  5. Your cellphone quantity (with nation code) or bot, however use the cellphone quantity choice when prompted second time.
  6. Verification code (despatched to your Telegram)

Preliminary Scraping Habits 🕒

When scraping a channel for the primary time, please be aware:

  • The script will try and retrieve all the channel historical past, ranging from the oldest messages
  • Preliminary scraping can take a number of minutes and even hours, relying on:
  • The overall variety of messages within the channel
  • Whether or not media downloading is enabled
  • The scale and variety of media recordsdata
  • Your web connection velocity
  • Telegram’s price limiting
  • The script makes use of pagination and maintains state, so if interrupted, it could resume from the place it left off
  • Progress share is displayed in real-time to trace the scraping standing
  • Messages are saved within the database as they’re scraped, so you can begin analyzing accessible knowledge even earlier than the scraping is full

Utilization 📝

The script gives an interactive menu with the next choices:

  • [A] Add new channel
  • Enter the channel ID or channelname
  • [R] Take away channel
  • Take away a channel from scraping record
  • [S] Scrape all channels
  • One-time scraping of all configured channels
  • [M] Toggle media scraping
  • Allow/disable downloading of media recordsdata
  • [C] Steady scraping
  • Actual-time monitoring of channels for brand new messages
  • [E] Export knowledge
  • Export to JSON and CSV codecs
  • [V] View saved channels
  • Listing all saved channels
  • [L] Listing account channels
  • Listing all channels with ID:s for account
  • [Q] Stop

Channel IDs 📢

You should utilize both: – Channel username (e.g., channelname) – Channel ID (e.g., -1001234567890)

Information Storage 💾

Database Construction

Information is saved in SQLite databases, one per channel: – Location: ./channelname/channelname.db – Desk: messagesid: Major key – message_id: Telegram message ID – date: Message timestamp – sender_id: Sender’s Telegram ID – first_name: Sender’s first title – last_name: Sender’s final title – username: Sender’s username – message: Message textual content – media_type: Kind of media (if any) – media_path: Native path to downloaded media – reply_to: ID of replied message (if any)

Media Storage 📁

Media recordsdata are saved in: – Location: ./channelname/media/ – Recordsdata are named utilizing message ID or unique filename

Exported Information 📊

Information might be exported in two codecs: 1. CSV: ./channelname/channelname.csv – Human-readable spreadsheet format – Simple to import into Excel/Google Sheets

  1. JSON: ./channelname/channelname.json
  2. Structured knowledge format
  3. Ideally suited for programmatic processing

Options in Element 🔍

Steady Scraping

The continual scraping characteristic ([C] choice) permits you to: – Monitor channels in real-time – Mechanically obtain new messages – Obtain media because it’s posted – Run indefinitely till interrupted (Ctrl+C) – Maintains state between runs

Media Dealing with

The script can obtain: – Images – Paperwork – Different media varieties supported by Telegram – Mechanically retries failed downloads – Skips present recordsdata to keep away from duplicates

Error Dealing with 🛠️

The script consists of: – Automated retry mechanism for failed media downloads – State preservation in case of interruption – Flood management compliance – Error logging for failed operations

Limitations ⚠️

  • Respects Telegram’s price limits
  • Can solely entry public channels or channels you are a member of
  • Media obtain dimension limits apply as per Telegram’s restrictions

Contributing 🤝

Contributions are welcome! Please be at liberty to submit a Pull Request.

License 📄

This challenge is licensed beneath the MIT License – see the LICENSE file for particulars.

Disclaimer ⚖️

This software is for instructional functions solely. Be certain to: – Respect Telegram’s Phrases of Service – Acquire needed permissions earlier than scraping – Use responsibly and ethically – Adjust to knowledge safety laws



Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles