使用athena从s3访问日志获取查询参数

sdnqo3pr 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(204)

我希望使用athena从s3访问日志中获取查询参数的Map。
e、 g.对于以下日志行示例： 283e.. foo [17/Jun/2017:23:00:49 +0000] 76.117.221.205 - 1D0.. REST.GET.OBJECT 1x1.gif "GET /foo.bar/1x1.gif?placement_tag_id=0&r=574&placement_hash=12345... HTTP/1.1" 200 ... "Mozilla/5.0" 我想得到一个Map查询参数[k，v]：
放置\u标记\u id，0 r，574放置\u散列，12345
所以我可以运行如下查询： select * from accessLogs where queryParams.placement_tag_id=0 and X.r>=500 不同请求的查询参数计数和内容不同，因此我不能使用静态regex模式。
我曾经 serde2.RegexSerDe 在下面的athena create table查询中对日志进行了基本的拆分，但是没有找到一个方法来实现我想要的。我想过使用多重分隔符，但雅典娜不支持。
有没有关于如何做到这一点的建议？ CREATE EXTERNAL TABLE IF NOT EXISTS elb_db.accessLogs ( timestamp string, request string, http_status string, user_agent string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '[^ ]* [^ ]* \\[(.*)\\] [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* "(.*?)" ([^ ]*) [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* ".*?" "(.*?)" [^ ]*' ) LOCATION 's3://output/bucket'

Hive hive-serde amazon-athena

来源：https://stackoverflow.com/questions/45061784/fetch-query-params-from-s3-access-log-using-athena