Curl 仅在不是 404 时保存

Question

我正在编写一个Python程序来下载我学校学生的一些照片。

这是我的代码：`

import os
count = 0
max_c = 1000000
while max_c >= count:
    os.system("curl http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg > "+str(count)+".jpg")
    count=count+1

`

问题是，如果图像存在于服务器上（不是 404），我只想保存 jpg，并且由于我在服务器上没有所有图像名称，所以我必须发送 0 之间所有图像的请求和 1000000，但并非所有 0 到 1000000 之间的图像都存在。所以我只想保存图像（如果服务器上存在）。我该怎么做（ubuntu）？

提前谢谢您

Answer 1

您可以使用“

-f

”参数静默失败（不打印HTTP错误），例如：

curl -f site.com/file.jpg

Answer 2

import urllib2
import sys

for i in range(1000000):
  try:
    pic = urllib2.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(i)+".jpg").read()
    with open(str(i).zfill(7)+".jpg") as f:
      f.write(pic)
    print "SUCCESS "+str(i)
  except KeyboardInterrupt:
    sys.exit(1)
  except urllib2.HTTPError, e:
    print "ERROR("+str(e.code)+") "+str(i)

应该可以，404 会抛出异常

Answer 3

我建议使用 python 提供的

urllib

库来实现您的目的。

count = 0
max_c = 1000000
while max_c >= count:
    resp = urllib.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg")
    if resp.getcode() == 404:
      //do nothing
    else:
    // do what you got to do.

   count=count+1

Answer 4

我认为最简单的方法是使用

wget

而不是

curl

，这会自动丢弃 404 响应。

Answer 5

这是旧的，但我发现在 bash 中你可以使用

--fail

并且它会默默失败。如果页面有错误，则不会下载...

Curl 仅在不是 404 时保存

问题描述投票：0回答：5

5个回答

最新问题

Curl 仅在不是 404 时保存

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5