Scrapy通过迭代返回相同的值

umuewwlo  于 5个月前  发布在  其他
关注(0)|答案(1)|浏览(84)

我正在使用Scrapy从一个网站上提取信息。我的目标是使用Scrapy拉高尔夫俱乐部的名称,价格等,并跟踪冬季的成本,并在价格下跌时购买我想要的东西。
到目前为止,我有它拉俱乐部的名称,但相同的名称38次。(有38个俱乐部在第一页。)
我想知道为什么它打印相同的名字而不是下一个名字?我使用了我在课程中做的一个例子,来做这个当前的例子。上面的代码集是我课程中的一个,第二个是我的。
``导入 scrapy

class Spiderbook0Spider(scrapy.Spider):
    name = "spiderbook0"
    allowed_domains = ["books.toscrape.com"]
    start_urls = ["https://books.toscrape.com"]

def parse(self, response):
    books = response.css('article.product_pod') # Get all the books on the first page
    for book in books: #Get a single book
        print(book.css('h3 a::text').get())

字符串

  • 我的代码-
import scrapy

class WedgepriceSpider(scrapy.Spider):
    name = "wedgeprice"
    allowed_domains = ["golftown.com"]
    start_urls = ["https://golftown.com/en-CA/clubs/wedges/"]
 

def parse(self, response):
    wedges = response.css("div.product-tile-top > div.product-image > a.thumb-link ")
    print("***********************************")
    print("***********************************")
    print(wedges)
    for wedge in wedges:
        print(response.xpath("//*[@class = 'name-link']/@title").get())
    print("***********************************")
    print("***********************************")

xxe27gdn

xxe27gdn1#

这是因为在for循环中,每次循环迭代都从html文件的根执行xpath查询。
相反,你要做的是首先查询一些父元素,它的递归次数与你试图打印的子元素相同,然后在第二个表达式中,你可以使用来自父元素的相对XPATH表达式来获取值并将其打印到终端。
举例来说:

import scrapy

class WedgepriceSpider(scrapy.Spider):
    name = "wedgeprice"
    allowed_domains = ["golftown.com"]
    start_urls = ["https://golftown.com/en-CA/clubs/wedges/"]

    def parse(self, response):
        print("***********************************")
        print("***********************************")
        for tile in response.css(".product-tile"):
            print(tile.xpath(".//*[@class = 'name-link']/@title").get())
        print("***********************************")
        print("***********************************")

字符串
输出

2023-11-19 21:55:23 [scrapy.core.engine] INFO: Spider opened
2023-11-19 21:55:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-11-19 21:55:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-11-19 21:55:23 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.golftown.com/en-CA/clubs/wedges/> from <GET https://golftown.com/en-CA/clubs/wedges/>
2023-11-19 21:55:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.golftown.com/en-CA/clubs/wedges/> (referer: None)
***********************************
***********************************
Milled Grind 4 Wedge with Steel Shaft
Glide 4.0 Wedge with Steel Shaft
RTX 4.0 Tour Satin Wedge with Steel Shaft
RTX 6 ZipCore Tour Satin Wedge with Steel Shaft
Milled Grind 3 Black Wedge with Steel Shaft
Milled Grind Wedge with Steel Shaft
JAWS RAW Chrome Wedge with Steel Shafts
Milled Grind 2 Hi-Toe Raw Wedge
RTX 6 ZipCore Black Satin Wedge with Steel Shaft
Mack Daddy Cavity Back Wedge with Steel Shaft
Staff Model Wedge with Steel Shaft
JAWS MD5 Platinum Chrome Wedge with Steel Shaft
Milled Grind 3 Chrome Wedge with Steel Shaft
CBX Full-Face 2 Tour Satin with Steel Shaft
SM9 Brushed Steel Wedge with Steel Shaft
King Cobra Snake Bite Wedge with Steel Shaft
SM9 Tour Chrome Wedge with Steel Shaft
PUR-S Black Wedge with Steel Shaft
JAWS RAW Chrome Wedge with Graphite Shafts
JAWS RAW Black Wedge with Steel Shafts
King Cobra Black Snake Bite Wedge with Steel Shaft
ChipR Wedge with Steel Shaft
T22 Blue Ion Wedge with Steel Shaft
S23 Copper Cobalt Wedge with Steel Shaft
S23 Satin Chrome Wedge with Steel Shaft
Smart Sole 4 S Black Wedge with Graphite Shaft
Smart Sole 4 G Black Wedge with Graphite Shaft
Smart Sole 4 C Black Wedge with Graphite Shaft
Smart Sole 4 S Black Wedge with Steel Shaft
Smart Sole 4 G Black Wedge with Steel Shaft
RTX Full-Face Black Wedge with Steel Shaft
CBX Zipcore Tour Satin Wedge with Graphite Shaft
CBX Zipcore Tour Satin Wedge with Steel Shaft
Women's CBX Zipcore Wedge with Graphite Shaft
Ladies X Act Chipper
***********************************
***********************************
2023-11-19 21:55:25 [scrapy.core.engine] INFO: Closing spider (finished)
2023-11-19 21:55:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 724,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 28484,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/301': 1,
 'elapsed_time_seconds': 2.699416,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 11, 20, 5, 55, 25, 901973),
 'httpcompression/response_bytes': 263357,
 'httpcompression/response_count': 1,
 'log_count/DEBUG': 3,
 'log_count/INFO': 10,
 'response_received_count': 1,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2023, 11, 20, 5, 55, 23, 202557)}

相关问题