druid查询详解
in bigdata with 0 comment

druid查询详解

in bigdata with 0 comment

查询的方式

druid的查询是主要是通过的curl提交相关的请求到broker,broker通过将请求发送给实时以及离线的节点,最后将结果进行merge并返回相关的结果。

当然目前也有sql的相关方式不过该项目尚未成熟。

druid支持的查询种类

聚合查询

包含groupby topN ,timeseries

元数据查询

Time Boundary 时间范围

Segment Metadata segment信息

Datasource Metadata 数据源的信息

搜索

search(select)

json查询相关参数的解析

json填补的相关参数主要有以下几个queryType、dataSource、granularity、filter、aggregator

示例的json文件如下:

{
  "queryType": "timeseries",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "descending": "true",
  "filter": {
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
      { "type": "or",
        "fields": [
          { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
          { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
        ]
      }
    ]
  },
  "aggregations": [
    { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
    { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "sample_divide",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
        { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
}

queryType解析

queryType之查询的类型:timeseries、topN、groupBy

dataSource

dataSource类似你查询的数据库里的表

granularity 粒度

1. 简单聚合粒度 - 支持字符串值有:all、none、second、minute、fifteen_minute、thirty_minute、hour、day、week、month、quarter、year
   (1) all - 将所有块变成一块
   (2) none - 不使用块数据(它实际上是使用最小索引的粒度,none意味着为毫秒级的粒度);按时间序列化查询时不建议使用none,因为所有的毫秒不存在,系统也将尝试生成0值,这往往是很多。
2. 时间段聚合粒度 - Druid指定一精确的持续时间(毫秒)和时间缀返回UTC(世界标准时间)。
3. 常用时间段聚合粒度 - 与时间段聚合粒度差不多,但是常用时间指平时我们常用时间段,如年、月、周、小时等。

假设我们有以下这些数据:

{"timestamp": "2013-08-31T01:02:33Z", "page": "AAA", "language" : "en"}  
{"timestamp": "2013-09-01T01:02:33Z", "page": "BBB", "language" : "en"}  
{"timestamp": "2013-09-02T23:32:45Z", "page": "CCC", "language" : "en"}  
{"timestamp": "2013-09-03T03:32:45Z", "page": "DDD", "language" : "en"}

我们按照小时进行聚合粒度进行查询:

{  
   "queryType":"groupBy",  
   "dataSource":"dataSource",  
   "granularity":"hour",  
   "dimensions":[  
      "language"  
   ],  
   "aggregations":[  
      {  
         "type":"count",  
         "name":"count"  
      }  
   ],  
   "intervals":[  
      "2000-01-01T00:00Z/3000-01-01T00:00Z"  
   ]  
}  

我们按照时间范围进行查询

{  
   "queryType":"groupBy",  
   "dataSource":"dataSource",  
   "granularity":{"type": "duration", "duration": 3600000, "origin": "2012-01-01T00:30:00Z"} ,  
   "dimensions":[  
      "language"  
   ],  
   "aggregations":[  
      {  
         "type":"count",  
         "name":"count"  
      }  
   ],  
   "intervals":[  
      "2000-01-01T00:00Z/3000-01-01T00:00Z"  
   ]  
}

filter

与sql where xxx=xxx一致

"filter": { "type": "selector", "dimension": <dimension_string>, "value": <dimension_value_string> }

filter中还有and or not 三种类型

"filter": { "type": "and/or/not", "fields": [<filter>, <filter>, ...] } 

in

{  
    "type": "in",  
    "dimension": "outlaw",  
    "values": ["Good", "Bad", "Ugly"]  
}  

bound

{  
    "type": "bound",  
    "dimension": "age",  
    "lower": "21",  
    "lowerStrict": true,  
    "upper": "31" ,  
    "upperStrict": true,  
    "alphaNumeric": true  
}  

聚合函数

聚合函数包含以下Count aggregator、Sum aggregators、Min / Max aggregators、Approximate Aggregations、Miscellaneous Aggregations 几种。这里着重讲下Approximate Aggregations。

Approximate

Cardinality aggregator

计算Druid多种维度基数,Cardinality aggregator使用HyperLogLog评估基数,这种聚合比带有索引的hyperUnique聚合慢,运行在一个维度列。

单维度同

SELECT COUNT(DISTINCT(dimension)) FROM <datasource>   一致。

多维度同

SELECT COUNT(DISTINCT(value)) FROM (  
  SELECT dim_1 as value FROM <datasource>  
  UNION  
  SELECT dim_2 as value FROM <datasource>  
  UNION  
  SELECT dim_3 as value FROM <datasource>  
)  


{  
  "type": "cardinality",  
  "name": "distinct_countries",  
  "fieldNames": [ "coutry_of_origin", "country_of_residence" ]  
}  

HyperUnique aggregator

已经被hyperunique在创建索引时聚合的维度值使用HyperLogLog计算估计

{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> } 

其余的查询详细可以参考官网的例子链接

Responses