使用selenium从网页中检索数据-不检索所有数据

bpsygsoo  于 2023-05-29  发布在  其他
关注(0)|答案(2)|浏览(187)

我试图从www.example.com检索数据(硬币名称,价格,coinmarket上限和流通供应)coinmarketcap.com,但当我运行下面的代码时,我只得到11个硬币名称。另外,我无法检索其他数据。我尝试了几种选择,但都没有成功。我的目标是将数据存储在dataframe中,这样我就可以分析它。

driver = webdriver.Chrome(r'C:\Users\Ejer\PycharmProjects\pythonProject\chromedriver')
driver.get('https://coinmarketcap.com/')

Crypto = driver.find_elements_by_xpath("//div[contains(concat(' ', normalize-space(@class), ' '), 'sc-16r8icm-0 sc-1teo54s-1 lgwUsc')]")
#price = driver.find_elements_by_xpath('//td[@class="cmc-link"]')
#coincap = driver.find_elements_by_xpath('//td[@class="DAY"]')

CMC_list = []
for c in range(len(Crypto)):
    CMC_list.append(Crypto[c].text)
print(CMC_list)

#driver.get('https://coinmarketcap.com/')
#print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[contains(@class, 'cmc-table')]//tbody//tr//td/a//p[@color='text']")))[:50]])

driver.close()
xxslljrj

xxslljrj1#

尝试以下代码行以获取第页上的所有值:

cryptos = [name.text for name in driver.find_elements_by_xpath('//td[3]/a[@class="cmc-link" and starts-with(@href, "/currencies/")]//p[@color="text"]')]
tsm1rwdh

tsm1rwdh2#

尝试使用BeautifulSoup删除coinmarket数据集

data_list = []
crypto_count = 0
for page in range(1, 100):
    url = f'https://coinmarketcap.com/?page={page}'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    rows = soup.find('table', {'class': 'sc-beb003d5-3 ieTeVa cmc- table'}).find('tbody').find_all('tr')

    crypto_list = []
    for row in rows:
        dic = {}
        cells = row.find_all('td')
        if len(cells) >= 10:
            dic['Name'] = cells[2].text.strip()
            dic['Price'] = cells[3].text.strip().replace(',', '')
            dic['OneH'] = cells[4].text.strip()
            dic['TwentyfourH'] = cells[5].text.strip()
            dic['SevenD'] = cells[6].text.strip()
            dic['MarketCap'] = cells[7].text.strip().replace(',', '')
            dic['Volume'] = cells[8].text.strip().replace(',', '')
            dic['CirculatingSupply'] = cells[9].text.strip().replace(',', '')

            crypto_list.append(dic)
            crypto_count += 1
            if crypto_count == 1000:
                break

    data_list.append(crypto_list)

相关问题