文本文件问题

lo8azlld  于 2021-06-30  发布在  Java
关注(0)|答案(1)|浏览(463)

我不知道我能问这个问题有多好,但是给定一个文本文件,我需要解析并提取productid数据并将其存储在hashset中,userid数据并将其存储在hashset中,review/score数据并将其存储在arraylist中。它们还需要用于创建一个图,其中productid与userid之间的边相连。
数据可以在这里找到http://snap.stanford.edu/data/web-finefoods.html 你可以忽略复习/时间、复习/帮助、复习/总结和复习/文本信息,它们不需要存储在内存中。
我当前的代码如下所示:

import java.io.*;
import java.util.*;
import java.nio.charset.*;

public class Reviews
{
    String fileName = "newfinefoods.txt";
    GraphType<String> foodReview;
    HashSet<String> productID;
    HashSet<String> userID;
    ArrayList<String> review;

    int counter; //was using this to make sure I'm counting all the lines which I think I am

    public Reviews(){
        foodReview = new GraphType<>();
        productID = new HashSet<>();
        userID = new HashSet<>();
        review = new ArrayList<>();
        counter = 0;
    }

    public int numReviews(){
        return review.size();
    }

    public int numProducts(){
        return productID.size();
    }

    public int numUsers(){
        return userID.size();
    }

    public void setupGraph(){
        Scanner fileScanner;
        String line = "";
        try{
            fileScanner = new Scanner (new File (fileName), "UTF-8");
            String pr = "";
            while(fileScanner.hasNextLine()){
                line = fileScanner.nextLine();
                String[] reviewInfo = line.split(": ");
                String productInfo = reviewInfo[1];
                System.out.println(productInfo);
            }
        }

        catch (IOException e){
            System.out.println(e);
        }
    }

    public static void main(String[] args){
        Reviews review = new Reviews();
        review.setupGraph();
        System.out.println("Number of Reviews:" + review.numReviews());
        System.out.println("Number of Products:" + review.numProducts());
        System.out.println("Number of Users:" + review.numUsers());

    }
}

每当我运行代码时,在数组reviewinfo中查找1,它只打印一组数据,但是如果我将其更改为0,它似乎打印所有信息(而不是我需要的信息)。我需要创建这个图表,并从数据中获取信息,但我真的只是超级卡住,任何提示或帮助将非常感谢!
以下是数据示例:

product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.

product/productId: B00813GRG4
review/userId: A1D87F6ZCVE5NK
review/profileName: dll pa
review/helpfulness: 0/0
review/score: 1.0
review/time: 1346976000
review/summary: Not as Advertised
review/text: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".

product/productId: B000LQOCH0
review/userId: ABXLMWJIXXAIN
review/profileName: Natalia Corres "Natalia Corres"
review/helpfulness: 1/1
review/score: 4.0
review/time: 1219017600
review/summary: "Delight" says it all
review/text: This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.

product/productId: B000UA0QIQ
eeq64g8w

eeq64g8w1#

您的设计的初始方法是正确的,但您应该对其进行更多的构造:
方法 setupGraph 应该用一些具体的参数化方法来划分:
由于用户和产品是类状态的一部分,我认为类的构造函数最好接收scanner作为输入参数。然后,在初始化状态变量之后,它应该调用 setupGraph (应该是私有的)通过输入扫描仪。 setupGraph 应接收输入扫描仪,并负责从中读取行,并对可能出现的异常进行适当处理。在每一行上,它应该只调用另一个私有方法来处理读取行。如果要计算所有读取行,则应在此处放置增量。
处理行方法应接收输入字符串,并负责决定其是否包含产品数据、用户数据、分数数据或无数据。这必须通过正确解析其内容来完成。这是你可以使用的地方 String.split() 获取每行的名称和值,然后计算名称以决定将值存储在何处。如果你想计算所有处理过的行,这是你应该放置增量的地方。
最后, main 方法应负责示例化扫描器,并在构建对象时传递扫描器。这样,您就可以从命令行接收文件名作为输入参数,这样您的程序就变得灵活了。
要知道,类的唯一公共方法应该是构造函数和getter。状态变量应该是私有的。

相关问题