erlang 解析多个项目的正确方法

vsikbqxv  于 2022-12-16  发布在  Erlang
关注(0)|答案(1)|浏览(111)

我有一个输入文件,其中有多行和多个字段,用空格分隔。我的定义文件是:
scanner.xrl

Definitions.

DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]

Rules.

(\s|\t)+ : skip_token.
\n : {end_token, {new_line, TokenLine}}.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.

Erlang code.

parser.yrl

Nonterminals line.

Terminals string.

Rootsymbol line.

Endsymbol new_line.

line -> string : ['$1'].
line -> string line: ['$1'|'$2'].

Erlang code.

当按原样运行时,第一行被解析,然后停止:

1> A = <<"a b c\nd e\nf\n">>.

2> {ok, T, _} = scanner:string(binary_to_list(A)).
{ok,[{string,1,"a"},
     {string,1,"b"},
     {string,1,"c"},
     {new_line,1},
     {string,2,"d"},
     {string,2,"e"},
     {new_line,2},
     {string,3,"f"},
     {new_line,3}],
    4}
3> parser:parse(T).
{ok,[{string,1,"a"},{string,1,"b"},{string,1,"c"}]}

如果我从parser.yrl中删除Endsymbol行,并将scanner.xrl文件更改为:

Definitions.

DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]

Rules.

(\s|\t|\n)+ : skip_token.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.

Erlang code.

我的所有行都被解析为单个项:

1> A = <<"a b c\nd e\nf\n">>.
<<"a b c\nd e\nf\n">>
2> {ok, T, _} = scanner:string(binary_to_list(A)).
{ok,[{string,1,"a"},
     {string,1,"b"},
     {string,1,"c"},
     {string,2,"d"},
     {string,2,"e"},
     {string,3,"f"}],
    4}
3> parser:parse(T).
{ok,[{string,1,"a"},
     {string,1,"b"},
     {string,1,"c"},
     {string,2,"d"},
     {string,2,"e"},
     {string,3,"f"}]}

什么是正确的方式来通知解析器每一行都应该被当作一个单独的项?我希望我的结果看起来像这样:

{ok,[[{string,1,"a"},
     {string,1,"b"},
     {string,1,"c"}],
     [{string,2,"d"},
     {string,2,"e"}],
     [{string,3,"f"}]]}

w1jd8yoj

w1jd8yoj1#

下面是一个正确的词法分析器/解析器对,它只使用1个shift/reduce来完成这项工作,但我认为它会解决您的问题,您只需要根据自己的喜好清理令牌。
我非常肯定有更简单、更快的方法来完成这一任务,但在我的“lexer战斗时代”,很难找到至少一些信息,我希望这将给予如何继续使用Erlang进行解析的想法。

扫描仪.xrl

Definitions.

DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]

Rules.

(\s|\t)+ : skip_token.
\n : {token, {line, TokenLine}}.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.

Erlang code.

解析器.yrl

Nonterminals 
    Lines
    Line
    Strings.

Terminals string line.

Rootsymbol Lines.

Lines -> Line Lines : lists:flatten(['$1', '$2']).
Lines -> Line : lists:flatten(['$1']).

Line -> Strings line : {line, lists:flatten(['$1'])}.
Line -> Strings : {line, lists:flatten(['$1'])}.

Strings -> string Strings : lists:append(['$1'], '$2').
Strings -> string : lists:flatten(['$1']).

Erlang code.

输出

{ok,[{line,[{string,1,"a"},{string,1,"b"},{string,1,"c"}]},
     {line,[{string,2,"d"},{string,2,"e"}]},
     {line,[{string,3,"f"}]}]}

解析器流程如下所示:

  • 根定义为抽象“线”
  • “行”包含“行+行”或简单地“行”,这给出了循环
  • “行”包含“字符串+行”或简单的“字符串”(当它是文件结尾时)
  • 当提供了许多字符串时,“字符串”包含"字符串“或”字符串“+字符串
  • “line”是“\n”符号

请允许我对我在原始代码中发现的问题给予一些评论。

  • 您应该将整个文件视为嵌套数组,而不是像每行解析那样,这就是为什么Lines/Line摘要提供
  • “终结符”意味着不会分析令牌是否包含任何其他令牌,“非终结符”将被进一步评估,这些都是复杂数据

相关问题