7.3.26. select
¶
7.3.26.1. Summary¶
select
searches records that are matched to specified conditions
from a table and then outputs them.
select
is the most important command in groonga. You need to
understand select
to use the full power of groonga.
7.3.26.2. Syntax¶
select
has many parameters. The required parameter is only
table
and others are optional:
select table
[match_columns=null]
[query=null]
[filter=null]
[scorer=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[cache=yes]
[match_escalation_threshold=0]
[query_expansion=null]
[query_flags=ALLOW_PRAGMA|ALLOW_COLUMN|ALLOW_UPDATE|ALLOW_LEADING_NOT|NONE]
[query_expander=null]
[adjuster=null]
7.3.26.3. Usage¶
Let's learn about select
usage with examples. This section shows
many popular usages.
Here are a schema definition and sample data to show usage.
Execution example:
table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]
There is a table, Entries
, for blog entries. An entry has title,
content and the number of likes for the entry. Title is key of
Entries
. Content is value of Entries.content
column. The
number of likes is value of Entries.n_likes
column.
Entries._key
column and Entries.content
column are indexed
using TokenBigram
tokenizer. So both Entries._key
and
Entries.content
are fulltext search ready.
OK. The schema and data for examples are ready.
7.3.26.3.1. Simple usage¶
Here is the most simple usage with the above schema and data. It outputs
all records in Entries
table.
Execution example:
select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]
Why does the command output all records? There are two reasons. The
first reason is that the command doesn't specify any search
conditions. No search condition means all records are matched. The
second reason is that the number of all records is 5. select
command outputs 10 records at a maximum by default. There are only 5
records. It is less than 10. So the command outputs all records.
7.3.26.3.2. Search conditions¶
Search conditions are specified by query
or filter
. You can
also specify both query
and filter
. It means that selected
records must be matched against both query
and filter
.
7.3.26.3.2.1. Search condition: query
¶
query
is designed for search box in Web page. Imagine a search box
in google.com. You specify search conditions for query
as space
separated keywords. For example, search engine
means a matched
record should contain two words, search
and engine
.
Normally, query
parameter is used for specifying fulltext search
conditions. It can be used for non fulltext search conditions but
filter
is used for the propose.
query
parameter is used with match_columns
parameter when
query
parameter is used for specifying fulltext search
conditions. match_columns
specifies which columnes and indexes are
matched against query
.
Here is a simple query
usage example.
Execution example:
select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The select
command searches records that contain a word fast
in content
column value from Entries
table.
query
has query syntax but its deatils aren't described here. See
Query syntax for datails.
7.3.26.3.2.2. Search condition: filter
¶
filter
is designed for complex search conditions. You specify
search conditions for filter
as ECMAScript like syntax.
Here is a simple filter
usage example.
Execution example:
select Entries --filter 'content @ "fast" && _key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The select
command searches records that contain a word fast
in content
column value and has Groonga
as _key
from
Entries
table. There are three operators in the command, @
,
&&
and ==
. @
is fulltext search operator. &&
and
==
are the same as ECMAScript. &&
is logical AND operator and
==
is equality operator.
filter
has more operators and syntax like grouping by (...)
its deatils aren't described here. See Script syntax for
datails.
7.3.26.3.3. Paging¶
You can specify range of outputted records by offset
and limit
.
Here is an example to output only the 2nd record.
Execution example:
select Entries --offset 1 --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
offset
is zero-origin. --offset 1
means output range is
started from the 2nd record.
limit
specifies the max number of output records. --limit 1
means the number of output records is 1 at a maximium. If no records
are matched, select
command outputs no records.
7.3.26.3.4. The total number of records¶
You can use --limit 0
to retrieve the total number of recrods
without any contents of records.
Execution example:
select Entries --limit 0
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ]
# ]
# ]
# ]
--limit 0
is also useful for retrieving only the number of matched
records.
7.3.26.4. Parameters¶
This section describes all parameters. Parameters are categorized.
7.3.26.4.1. Required parameter¶
There is a required parameter, table
.
7.3.26.4.1.1. table
¶
It specifies a table to be searched. table
must be specified.
If nonexistent table is specified, an error is returned.
Execution example:
select Nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "invalid table name: <Nonexistent>",
# [
# [
# "grn_select",
# "proc.c",
# 778
# ]
# ]
# ]
# ]
7.3.26.4.3. Advanced search parameters¶
7.3.26.4.3.1. match_escalation_threshold
¶
It specifies threshold to determine whether search storategy escalation is used or not. The threshold is compared against the number of matched records. If the number of matched records is equal to or less than the threshold, the search storategy escalation is used. See 検索 about the search storategy escalation.
The default threshold is 0. It means that search storategy escalation is used only when no records are matched.
The default threshold can be customized by one of the followings.
--with-match-escalation-threshold
option of configure--match-escalation-threshold
option of groogna commandmatch-escalation-threshold
configuration item in configuration file
Here is a simple match_escalation_threshold
usage example. The
first select
doesn't have match_escalation_threshold
parameter. The second select
has match_escalation_threshold
parameter.
Execution example:
select Entries --match_columns content --query groo
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
select Entries --match_columns content --query groo --match_escalation_threshold -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ]
# ]
# ]
# ]
The first select
command searches records that contain a word
groo
in content
column value from Entries
table. But no
records are matched because the TokenBigram
tokenizer tokenizes
groonga
to groonga
not gr|ro|oo|on|ng|ga
. (The
TokenBigramSplitSymbolAlpha
tokenizer tokenizes groonga
to
gr|ro|oo|on|ng|ga
. See Tokenizers for details.)
It means that groonga
is indexed but groo
isn't indexed. So no
records are matched against groo
by exact match. In the case, the
search storategy escalation is used because the number of matched
records (0) is equal to match_escalation_threshold
(0). One record
is matched against groo
by unsplit search.
The second select
command also searches records that contain a
word groo
in content
column value from Entries
table. And
it also doesn't found matched records. In this case, the search
storategy escalation is not used because the number of matched
records (0) is larger than match_escalation_threshold
(-1). So no
more searches aren't executed. And no records are matched.
7.3.26.4.3.2. query_expansion
¶
Deprecated. Use query_expander instead.
7.3.26.4.3.3. query_flags
¶
It customs query
parameter syntax. You cannot update column value
by query
parameter by default. But if you specify
ALLOW_COLUMN|ALLOW_UPDATE
as query_flags
, you can update
column value by query
.
Here are available values:
ALLOW_PRAGMA
ALLOW_COLUMN
ALLOW_UPDATE
ALLOW_LEADING_NOT
NONE
ALLOW_PRAGMA
enables pragma at the head of query
. This is not
implemented yet.
ALLOW_COLUMN
enables search againt columns that are not included
in match_columns
. To specify column, there are COLUMN:...
syntaxes.
ALLOW_UPDATE
enables column update by query
with
COLUMN:=NEW_VALUE
syntax. ALLOW_COLUMN
is also required to
update column because the column update syntax specifies column.
ALLOW_LEADING_NOT
enables leading NOT condition with -WORD
syntax. The query searches records that doesn't match
WORD
. Leading NOT condition query is heavy query in many cases
because it matches many records. So this flag is disabled by
default. Be careful about it when you use the flag.
NONE
is just ignores. You can use NONE
for specifying no flags.
They can be combined by separated |
such as
ALLOW_COLUMN|ALLOW_UPDATE
.
The default value is ALLOW_PRAGMA|ALLOW_COLUMN
.
Here is a usage example of ALLOW_COLUMN
.
Execution example:
select Entries --query content:@mroonga --query_flags ALLOW_COLUMN
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The select
command searches records that contain mroonga
in
content
column value from Entries
table.
Here is a usage example of ALLOW_UPDATE
.
Execution example:
table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "alice", "age": 18},
{"_key": "bob", "age": 20}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
The first select
command sets age
column value of all records
to 19
. The second select
command outputs updated age
column values.
Here is a usage example of ALLOW_LEADING_NOT
.
Execution example:
select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]
The select
command searches records that don't contain mroonga
in content
column value from Entries
table.
Here is a usage example of NONE
.
Execution example:
select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The select
command searches records that contain one of two words
mroonga
or _key:Groonga
in content
from Entries
table.
Note that _key:Groonga
doesn't mean that the value of _key
column is equal to Groonga
. Because ALLOW_COLUMN
flag is not
specified.
See also Query syntax.
7.3.26.4.3.4. query_expander
¶
It's for query expansion. Query expansion substitutes specific words to another words in query. Nomally, it's used for synonym search.
It specifies a column that is used to substitute query
parameter
value. The format of this parameter value is
"${TABLE}.${COLUMN}
". For example, "Terms.synonym
" specifies
synonym
column in Terms
table.
Table for query expansion is called "substitution table". Substitution
table's key must be ShortText
. So array table (TABLE_NO_KEY
)
can't be used for query expansion. Because array table doesn't have
key.
Column for query expansion is called "substitution
column". Substitution column's value type must be
ShortText
. Column type must be vector (COLUMN_VECTOR
).
Query expansion substitutes key of substitution table in query with
values in substitution column. If a word in query
is a key of
substitution table, the word is substituted with substitution column
value that is associated with the key. Substition isn't performed
recursively. It means that substitution target words in substituted
query aren't substituted.
Here is a sample substitution table to show a simple
query_expander
usage example.
Execution example:
table_create Thesaurus TABLE_PAT_KEY|KEY_NORMALIZE ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Thesaurus synonym COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Thesaurus
[
{"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
{"_key": "groonga", "synonym": ["groonga", "senna"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
Thesaurus
substitution table has two synonyms, "mroonga"
and
"groonga"
. If an user searches with "mroonga"
, groonga
searches with "((mroonga) OR (tritonn) OR (groonga mysql))"
. If an
user searches with "groonga"
, groonga searches with "((groonga)
OR (senna))"
. Nomrally, it's good idea that substitution table has
KEY_NORMALIZE
flag. If the flag is used, substitute target word is
matched in case insensitive manner.
Note that those synonym values include the key value such as
"mroonga"
and "groonga"
. It's recommended that you include the
key value. If you don't include key value, substituted value doesn't
include the original substitute target value. Normally, including the
original value is better search result. If you have a word that you
don't want to be searched, you should not include the original
word. For example, you can implement "stop words" by an empty vector
value.
Here is a simple query_expander
usage example.
Execution example:
select Entries --match_columns content --query "mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]
The first select
command doesn't use query expansion. So a record
that has "tritonn"
isn't found. The second select
command uses
query expansion. So a record that has "tritonn"
is found. The
third select
command doesn't use query expansion but it is same as
the second select
command. The third one uses expanded query.
Each substitute value can contain any Query syntax syntax
such as (...)
and OR
. You can use complex substitution by
using those syntax.
Here is a complex substitution usage example that uses query syntax.
Execution example:
load --table Thesaurus
[
{"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The load
command registers a new synonym "popular"
. It is
substituted with ((popular) OR (n_likes:>=10))
. The substituted
query means that "popular" is containing the word "popular" or 10 or
more liked entries.
The select
command outputs records that n_likes
column value
is equal to or more than 10
from Entries
table.
7.3.26.5. 返値¶
TODO: write in English and add example.
以下のようなjson形式で値が返却されます。
[[リターンコード, 処理開始時間, 処理時間], [検索結果, ドリルダウン結果]]
リターンコード
grn_rcに対応する数値が返されます。0(GRN_SUCCESS)以外の場合は、続いてエラー内容を示す 文字列が返されます。
処理開始時間
処理を開始した時間について、1970年1月1日0時0分0秒を起点とした秒数を小数で返します。
処理時間
処理にかかった秒数を返します。
検索結果
drilldown条件が実行される前の検索結果が以下のように出力されます。:
[[ヒット数], [[カラム名1,カラム型1],..], 検索結果1,..]
ヒット数
検索条件にヒットしたレコードの数が出力されます。--limit
オプションで出力件数を制限した場合は出力するレコード数と一致しません。ヒット数
は--limit
オプションに関係なく常にヒットしたレコードの数になります。
カラム名n
output_columnsに指定された条件に従って、対象となるカラム名が出力されます。
カラム型n
output_columnsに指定された条件に従って、対象となるカラム型が出力されます。
検索結果n
output_columns, offset, limitによって指定された条件に従って各レコードの値が出力されます。
drilldown結果
drilldown処理の結果が以下のように出力されます。:
[[[件数], [[カラム名1,カラム型1],..], 検索結果1,..],..]
件数
drilldownに指定されたカラムの値の異なり数が出力されます。
カラム名n
drilldown_output_columnsに指定された条件に従って、対象となるカラム名が出力されます。
カラム型n
drilldown_output_columnsに指定された条件に従って、対象となるカラム型が出力されます。
ドリルダウン結果n
drilldown_output_columns, drilldown_offset, drilldown_limitによって指定された条件に従って各レコードの値が出力されます。