To get started, you need the requests library for the downloads and concurrent.futures for the threading logic.
For large batches, use the tqdm library. It integrates easily with ThreadPoolExecutor to show a real-time progress bar in your terminal. Threading vs. Asyncio
While ThreadPoolExecutor is the easiest to implement, asyncio with aiohttp is technically more efficient for massive scales (10,000+ images). However, for most developers, the simplicity and readability of threads make ThreadPoolExecutor the go-to choice for quick, reliable image scraping.
Usually between 5 and 20 for standard web scraping. 2. Handling Exceptions
import requests import os from concurrent.futures import ThreadPoolExecutor def download_image(url): try: response = requests.get(url, timeout=10) if response.status_code == 200: filename = os.path.join("images", url.split("/")[-1]) with open(filename, 'wb') as f: f.write(response.content) return f"Success: {url}" except Exception as e: return f"Error: {url} - {e}" image_urls = ["https://example.com", "https://example.com"] # Your list # Create directory if it doesn't exist os.makedirs("images", exist_ok=True) # Using ThreadPoolExecutor with ThreadPoolExecutor(max_workers=10) as executor: results = list(executor.map(download_image, image_urls)) Use code with caution. Key Optimization Tips
Network requests fail often. Always wrap your download logic in a try-except block to ensure one dead link doesn't crash the entire script. 3. Using Sessions
Downloading images is "I/O-bound." Your script waits for the server to send data.
Efficiently downloading hundreds or thousands of images requires more than a simple loop. If you download files one by one, your program spends most of its time waiting for network responses rather than using your CPU or bandwidth.
ThreadPoolExecutor is perfect for I/O tasks. It manages a "pool" of threads that work simultaneously.
The max_workers parameter determines how many threads run at once. You aren't fully utilizing your connection.