我想刮Midjourney网站,像往常一样,我去了requests-html,我以前在一个著名的动态网站称为Digikala工作。问题是,渲染失败,我不能选择图像!
使用requests_html.HTMLSession
:
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://midjourney.com/showcase/top/')
response.html.arender(timeout=60, sleep=5)
print(response.html.xpath('//img')) # Output: []
对于requests_html.AsyncHTMLSession
:
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
response = await asession.get('https://midjourney.com/showcase/top/')
await response.html.arender(timeout=60, sleep=5)
print(response.html.xpath('//img')) # Output: []
我尝试了各种方法,包括在这期:
https://github.com/psf/requests-html/issues/294
selenium 的结果是这样的:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Firefox() # Also tested on Chrome
driver.get('https://midjourney.com/showcase/top/')
print(driver.find_elements(By.XPATH, '//img'))
# Output:
# [<selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="62660cb6-495b-434b-9bf0-80ff7a7df544")>,
# <selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="8584c640-460c-422e-bd56-41327a745cee")>,
# <selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="90f29838-1a88-4f2b-b4f7-b143af549a0b")>,
# <selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="7eca7e52-d807-4a1c-9b4b-cc9d4c98d728")>,
...]
与requests-html配合使用的解决方案...
1条答案
按热度按时间tjrkku2a1#
这里有一种方法可以通过Requests(* 而不是 * requests-html-已弃用)获取该数据。
最终结果: