Question

Все мои ссылки в рабочем состоянии, и я все еще тестировал их в браузере. При загрузке изображений я получаю ошибки ниже.

Произошла ошибка при получении: "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_1.jpeg"
произошла ошибка при извлечении: "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_2.jpeg"
произошла ошибка при извлечении: "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_3.jpeg"
произошла ошибка при извлечении: "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_4.jpeg "

import urllib.request
from urllib.error import URLError # the docs say this is the base error you need to catch
import time
import datetime,time
from PIL import Image
start_time = time.time()
today=time.strftime("%Y%m%d")
m=today=time.strftime("%m")
d=today=time.strftime("%d")
Y=today=time.strftime("%Y")
A=today=time.strftime("%b")

for i in range(1,5):
    issue_id1=str(i)
    url = "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/"+str(Y) +"/"+str(m)+"/"+str(d)+"/"+str(Y+m+d)+"_"+str(i)+".jpeg"
    try:        
        s = urllib.request.urlopen(url)
        contents = s.read()
    except URLError:
        print('an error occurred while fetching: "{}"'.format(url))
        continue
    file = open("D:/IMAGES/"+issue_id1+".jpeg", "wb")
    file.write(contents)

Akshay Poklekar · Answer 1 · 29 апреля 2020

Вот код вашего второго примера.

import requests
import datetime,time

start_time = time.time()
today=time.strftime("%Y%m%d")
month=today=time.strftime("%m")
day=today=time.strftime("%d")
year=today=time.strftime("%Y")

url = "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/{year}/{month}/{day}/{year}{month}{day}_{issue_id}.jpeg"
path = "D:/IMAGES/{issue_id}.jpeg"

fetched_images = []

for issue_id in range(1, 15):
    try:
        # Let's create the url for the given issue.
        issue_url = url.format(
            year=year,
            month=month,
            day=day,
            issue_id=issue_id)

        # GET the url content
        req = urllib.request.Request(url, 
            headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}

        # Add the image to your list
        fetched_images.append(url)

        # Save to file if succesful and close the file when done.
        with open(path.format(issue_id=issue_id), 'wb') as f:
            f.write(req.content)
    except Exception as e:
        # If something went wrong, just print the url and the error.
        print('Failed to fetch {url} with error {e}'.format(
            url=issue_url, e=e))

Ошибка выдачи

  File "d:/test2.py", line 29
    fetched_images.append(url)
    ^
SyntaxError: invalid syntax

S.D. · Answer 2 · 29 апреля 2020

Кажется, что этому хосту, на котором вы получаете изображения, не нравятся заголовки по умолчанию, поставляемые с urllib.

Эта скорректированная версия, по-видимому, корректно извлекает ваши изображения:

import urllib.request
from urllib.error import URLError # the docs say this is the base error you need to catch
import time
import datetime,time
from PIL import Image
start_time = time.time()
today=time.strftime("%Y%m%d")
m=today=time.strftime("%m")
d=today=time.strftime("%d")
Y=today=time.strftime("%Y")
A=today=time.strftime("%b")

fetched_images = []

for i in range(1,5):
    issue_id1=str(i)
    url = "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/"+str(Y) +"/"+str(m)+"/"+str(d)+"/"+str(Y+m+d)+"_"+str(i)+".jpeg"
    try:
        # First build the request, and adjust the headers to something else.
        req = urllib.request.Request(url, 
            headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}
        )

        # Secondly fetch your image
        s = urllib.request.urlopen(req)
        contents = s.read()

        # Append to your image-list
        fetched_images.append(url)
    except URLError:
        print(url)
        print('an error occurred while fetching: "{}"'.format(url))
        continue
    file = open("D:/IMAGES/"+issue_id1+".jpeg", "wb")
    file.write(contents)

Чтобы уточнить, сначала создайте запрос с настроенными заголовками. Только после этого откройте URL-адрес, выбрав req.

. Еще один способ go - использовать запросы. В вашем случае это на самом деле работает из коробки. Перед запуском вам нужно будет получить пакет запросов. pip install requests

import requests
import datetime,time

start_time = time.time()
today=time.strftime("%Y%m%d")
month=today=time.strftime("%m")
day=today=time.strftime("%d")
year=today=time.strftime("%Y")

url = "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/{year}/{month}/{day}/{year}{month}{day}_{issue_id}.jpeg"
path = "D:/IMAGES/{issue_id}.jpeg"

fetched_images = []

for issue_id in range(1, 5):
    try:
        # Let's create the url for the given issue.
        issue_url = url.format(
            year=year,
            month=month,
            day=day,
            issue_id=issue_id)

        # GET the url content
        req = requests.get(issue_url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}))

        # Add the image to your list
        fetched_images.append(issue_url)

        # Save to file if succesful and close the file when done.
        with open(path.format(issue_id=issue_id), 'wb') as f:
            f.write(req.content)
    except Exception as e:
        # If something went wrong, just print the url and the error.
        print('Failed to fetch {url} with error {e}'.format(
            url=issue_url, e=e))

Akshay Poklekar · Answer 3 · 29 апреля 2020

Я выполнил приведенный ниже код.

import requests
import urllib.request
import datetime,time

start_time = time.time()
today=time.strftime("%Y%m%d")
month=today=time.strftime("%m")
day=today=time.strftime("%d")
year=today=time.strftime("%Y")

url = "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/{year}/{month}/{day}/{year}{month}{day}_{issue_id}.jpeg"
path = "D:/IMAGES/{issue_id}.jpeg"

fetched_images = []

for issue_id in range(1, 15):
    try:
        # Let's create the url for the given issue.
        issue_url = url.format(
            year=year,
            month=month,
            day=day,
            issue_id=issue_id)

        # GET the url content
        req = urllib.request.Request(url, 
            headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})

        # Add the image to your list
        fetched_images.append(url)

        # Save to file if succesful and close the file when done.
        with open(path.format(issue_id=issue_id), 'wb') as f:
            f.write(req.content)
    except Exception as e:
        # If something went wrong, just print the url and the error.
        print('Failed to fetch {url} with error {e}'.format(
            url=issue_url, e=e))

И теперь получаю ошибки ниже:

Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_1.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_2.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_3.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_4.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_5.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_6.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_7.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_8.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_9.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_10.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_11.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_12.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_13.jpeg with error 'Request' object has no attribute 'content'
Failed to fetch http://epaperlokmat.in/eNewspaper/News/LOK/MULK/2020/04/29/20200429_14.jpeg with error 'Request' object has no attribute 'content'

Akshay Poklekar · Answer 4 · 29 апреля 2020

Теперь я могу загрузить изображения с кодом ниже.
Но он все равно загружает изображение с неизвестным источником / расширением / форматом, если URL не найден. (404 не найдено)

import requests
import urllib.request
from urllib.error import URLError # the docs say this is the base error you need to catch
import time
import datetime,time
from PIL import Image
start_time = time.time()
today=time.strftime("%Y%m%d")
m=today=time.strftime("%m")
d=today=time.strftime("%d")
Y=today=time.strftime("%Y")
A=today=time.strftime("%b")

for i in range(1,10):
    issue_id1=str(i)
    try:
        url = "http://epaperlokmat.in/eNewspaper/News/LOK/MULK/"+str(Y) +"/"+str(m)+"/"+str(d)+"/"+str(Y+m+d)+"_"+str(i)+".jpeg"    
        myfile=requests.get(url)
    except URLError:
        print('an error occurred while fetching: "{}"'.format(url))
        continue
    open("D:/IMAGES/"+issue_id1+".jpeg", "wb").write(myfile.content)

Как я могу загрузить изображения с URL-адресов и пропустить те изображения, которых нет в Python?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 4 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как я могу загрузить изображения с URL-адресов и пропустить те изображения, которых нет в Python?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 4 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы