edu.uci.ics.crawler4j.url.WebURL.getDomain()方法的使用及代码示例

x33g5p2x  于2022-02-03 转载在 其他  
字(1.4k)|赞(0)|评价(0)|浏览(98)

本文整理了Java中edu.uci.ics.crawler4j.url.WebURL.getDomain()方法的一些代码示例,展示了WebURL.getDomain()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。WebURL.getDomain()方法的具体详情如下:
包路径:edu.uci.ics.crawler4j.url.WebURL
类名称:WebURL
方法名:getDomain

WebURL.getDomain介绍

[英]If WebURL was provided with a TLDList then domain will be the privately registered domain which is an immediate child of an effective top level domain as defined at publicsuffix.org. Otherwise it will be the entire domain.
[中]如果WebURL提供了TLDList,则域将是私人注册的域,它是publicsuffix.org定义的有效顶级域的直接子域。否则它将是整个域。

代码示例

代码示例来源:origin: biezhi/java-library-examples

int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String domain = page.getWebURL().getDomain();
String path = page.getWebURL().getPath();
String subDomain = page.getWebURL().getSubDomain();

代码示例来源:origin: tjake/stormscraper

if (stayOnDomain && !baseURL.getDomain().equals(curURL.getDomain()))
Long lastCrawl = niceTracker.get(curURL.getDomain());
long now = System.currentTimeMillis();
if (lastCrawl != null && (now - lastCrawl) < throttlePauseMs)
  logger.info("Slowing down crawler to {}, sleeping for {}ms", curURL.getDomain(), now - lastCrawl);
  Utils.sleep(now - lastCrawl);
  niceTracker.put(curURL.getDomain(), System.currentTimeMillis());
  pageTracker.put(curURL.getURL(),"hit");
        if (depth > 0 && (!stayOnDomain || baseURL.getDomain().equals(webURL.getDomain())))

相关文章