Files
@ caef0be39948
Branch filter:
Location: kallithea/CONTRIBUTORS
caef0be39948
5.3 KiB
text/plain
search: make "repository:" condition work as expected
Before this revision, "repository:foo" condition at searching for
"File contents" or "File names" shows files in repositories below.
- foo
- foo/bar
- foo-bar
- and so on ...
Whoosh library, which is used to parse text for indexing and seaching,
does:
- treat almost all non-alphanumeric characters as delimiter both at
indexing search items and at parsing search condition
- make each fields for a search item be indexed by multiple values
For example, files in "foo/bar" repository are indexed by "foo" and
"bar" in "repository" field. This tokenization make "repository:foo"
search condition match against files in "foo/bar" repository, too.
In addition to it, using plain TEXT also causes unintentional
ignorance of "stop words" in search conditions. For example, "this",
"a", "you", and so on are ignored at indexing and parsing, because
these are too generic words (from point of view of generic "text
search").
This issue can't be resolved by using ID instead of TEXT for
"repository" of SCHEMA, like as previous revisions for JOURNAL_SCHEMA,
because:
- highlight-ing file content requires SCHEMA to support "positions"
feature, but using ID instead of TEXT disables it
- using ID violates current case-insensitive search policy, because
it preserves case of text
To make "repository:" condition work as expected, this revision
explicitly specifies "analyzer", which does:
- avoid tokenization
- match case-insensitively
- avoid removing "stop words" from text
This revision requires full re-building index tables, because indexing
schema is changed.
BTW, "repository:" condition at searching for "Commit messages" uses
CHGSETS_SCHEMA instead of SCHEMA. The former uses ID for "repository",
and it does:
- avoid issues by tokenization and removing "stop words"
- disable "positions" feature of CHGSETS_SCHEMA
But highlight-ing file content isn't needed at searching for
"Commit messages". Therefore, this can be ignored.
- preserve case of text
This violates current case-insensitive search policy, This issue
will be fixed by subsequent revision, because fixing it isn't so
simple.
Before this revision, "repository:foo" condition at searching for
"File contents" or "File names" shows files in repositories below.
- foo
- foo/bar
- foo-bar
- and so on ...
Whoosh library, which is used to parse text for indexing and seaching,
does:
- treat almost all non-alphanumeric characters as delimiter both at
indexing search items and at parsing search condition
- make each fields for a search item be indexed by multiple values
For example, files in "foo/bar" repository are indexed by "foo" and
"bar" in "repository" field. This tokenization make "repository:foo"
search condition match against files in "foo/bar" repository, too.
In addition to it, using plain TEXT also causes unintentional
ignorance of "stop words" in search conditions. For example, "this",
"a", "you", and so on are ignored at indexing and parsing, because
these are too generic words (from point of view of generic "text
search").
This issue can't be resolved by using ID instead of TEXT for
"repository" of SCHEMA, like as previous revisions for JOURNAL_SCHEMA,
because:
- highlight-ing file content requires SCHEMA to support "positions"
feature, but using ID instead of TEXT disables it
- using ID violates current case-insensitive search policy, because
it preserves case of text
To make "repository:" condition work as expected, this revision
explicitly specifies "analyzer", which does:
- avoid tokenization
- match case-insensitively
- avoid removing "stop words" from text
This revision requires full re-building index tables, because indexing
schema is changed.
BTW, "repository:" condition at searching for "Commit messages" uses
CHGSETS_SCHEMA instead of SCHEMA. The former uses ID for "repository",
and it does:
- avoid issues by tokenization and removing "stop words"
- disable "positions" feature of CHGSETS_SCHEMA
But highlight-ing file content isn't needed at searching for
"Commit messages". Therefore, this can be ignored.
- preserve case of text
This violates current case-insensitive search policy, This issue
will be fixed by subsequent revision, because fixing it isn't so
simple.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | List of contributors to Kallithea project:
Mads Kiilerich <madski@unity3d.com> 2012-2016
Takumi IINO <trot.thunder@gmail.com> 2012-2016
Unity Technologies 2012-2016
Andrew Shadura <andrew@shadura.me> 2012 2014-2016
Dominik Ruf <dominikruf@gmail.com> 2012 2014-2016
Thomas De Schampheleire <thomas.de.schampheleire@gmail.com> 2014-2016
Étienne Gilli <etienne.gilli@gmail.com> 2015-2016
Jan Heylen <heyleke@gmail.com> 2015-2016
Robert Martinez <ntttq@inboxen.org> 2015-2016
Robert Rauch <mail@robertrauch.de> 2015-2016
Søren Løvborg <sorenl@unity3d.com> 2015-2016
Angel Ezquerra <angel.ezquerra@gmail.com> 2016
Asterios Dimitriou <steve@pci.gr> 2016
Kateryna Musina <kateryna@unity3d.com> 2016
Konstantin Veretennicov <kveretennicov@gmail.com> 2016
Oscar Curero <oscar@naiandei.net> 2016
Robert James Dennington <tinytimrob@googlemail.com> 2016
timeless@gmail.com 2016
YFdyh000 <yfdyh000@gmail.com> 2016
Aras Pranckevičius <aras@unity3d.com> 2012-2013 2015
Sean Farley <sean.michael.farley@gmail.com> 2013-2015
Christian Oyarzun <oyarzun@gmail.com> 2014-2015
Joseph Rivera <rivera.d.joseph@gmail.com> 2014-2015
Michal Čihař <michal@cihar.com> 2014-2015
Anatoly Bubenkov <bubenkoff@gmail.com> 2015
Andrew Bartlett <abartlet@catalyst.net.nz> 2015
Balázs Úr <urbalazs@gmail.com> 2015
Ben Finney <ben@benfinney.id.au> 2015
Branko Majic <branko@majic.rs> 2015
Daniel Hobley <danielh@unity3d.com> 2015
David Avigni <david.avigni@ankapi.com> 2015
Denis Blanchette <dblanchette@coveo.com> 2015
duanhongyi <duanhongyi@doopai.com> 2015
EriCSN Chang <ericsning@gmail.com> 2015
Grzegorz Krason <grzegorz.krason@gmail.com> 2015
Jiří Suchan <yed@vanyli.net> 2015
Kazunari Kobayashi <kobanari@nifty.com> 2015
Kevin Bullock <kbullock@ringworld.org> 2015
kobanari <kobanari@nifty.com> 2015
Marc Abramowitz <marc@marc-abramowitz.com> 2015
Marc Villetard <marc.villetard@gmail.com> 2015
Matthias Zilk <matthias.zilk@gmail.com> 2015
Michael Pohl <michael@mipapo.de> 2015
Michael V. DePalatis <mike@depalatis.net> 2015
Morten Skaaning <mortens@unity3d.com> 2015
Nick High <nick@silverchip.org> 2015
Niemand Jedermann <predatorix@web.de> 2015
Peter Vitt <petervitt@web.de> 2015
Ronny Pfannschmidt <opensource@ronnypfannschmidt.de> 2015
Sam Jaques <sam.jaques@me.com> 2015
Tuux <tuxa@galaxie.eu.org> 2015
Viktar Palstsiuk <vipals@gmail.com> 2015
Ante Ilic <ante@unity3d.com> 2014
Bradley M. Kuhn <bkuhn@sfconservancy.org> 2014
Calinou <calinou@opmbx.org> 2014
Daniel Anderson <daniel@dattrix.com> 2014
Henrik Stuart <hg@hstuart.dk> 2014
Ingo von Borstel <kallithea@planetmaker.de> 2014
Jelmer Vernooij <jelmer@samba.org> 2014
Jim Hague <jim.hague@acm.org> 2014
Matt Fellows <kallithea@matt-fellows.me.uk> 2014
Max Roman <max@choloclos.se> 2014
Na'Tosha Bard <natosha@unity3d.com> 2014
Rasmus Selsmark <rasmuss@unity3d.com> 2014
Tim Freund <tim@freunds.net> 2014
Travis Burtrum <android@moparisthebest.com> 2014
Zoltan Gyarmati <mr.zoltan.gyarmati@gmail.com> 2014
Marcin Kuźmiński <marcin@python-works.com> 2010-2013
xpol <xpolife@gmail.com> 2012-2013
Aparkar <aparkar@icloud.com> 2013
Dennis Brakhane <brakhane@googlemail.com> 2013
Grzegorz Rożniecki <xaerxess@gmail.com> 2013
Jonathan Sternberg <jonathansternberg@gmail.com> 2013
Leonardo Carneiro <leonardo@unity3d.com> 2013
Magnus Ericmats <magnus.ericmats@gmail.com> 2013
Martin Vium <martinv@unity3d.com> 2013
Simon Lopez <simon.lopez@slopez.org> 2013
Ton Plomp <tcplomp@gmail.com> 2013
Augusto Herrmann <augusto.herrmann@planejamento.gov.br> 2011-2012
Dan Sheridan <djs@adelard.com> 2012
Dies Koper <diesk@fast.au.fujitsu.com> 2012
Erwin Kroon <e.kroon@smartmetersolutions.nl> 2012
H Waldo G <gwaldo@gmail.com> 2012
hppj <hppj@postmage.biz> 2012
Indra Talip <indra.talip@gmail.com> 2012
mikespook 2012
nansenat16 <nansenat16@null.tw> 2012
Philip Jameson <philip.j@hostdime.com> 2012
Raoul Thill <raoul.thill@gmail.com> 2012
Stefan Engel <mail@engel-stefan.de> 2012
Tony Bussieres <t.bussieres@gmail.com> 2012
Vincent Caron <vcaron@bearstech.com> 2012
Vincent Duvert <vincent@duvert.net> 2012
Vladislav Poluhin <nuklea@gmail.com> 2012
Zachary Auclair <zach101@gmail.com> 2012
Ankit Solanki <ankit.solanki@gmail.com> 2011
Dmitri Kuznetsov 2011
Jared Bunting <jared.bunting@peachjean.com> 2011
Jason Harris <jason@jasonfharris.com> 2011
Les Peabody <lpeabody@gmail.com> 2011
Liad Shani <liadff@gmail.com> 2011
Lorenzo M. Catucci <lorenzo@sancho.ccd.uniroma2.it> 2011
Matt Zuba <matt.zuba@goodwillaz.org> 2011
Nicolas VINOT <aeris@imirhil.fr> 2011
Shawn K. O'Shea <shawn@eth0.net> 2011
Thayne Harbaugh <thayne@fusionio.com> 2011
Łukasz Balcerzak <lukaszbalcerzak@gmail.com> 2010
Andrew Kesterson <andrew@aklabs.net>
cejones
David A. Sjøen <david.sjoen@westcon.no>
James Rhodes <jrhodes@redpointsoftware.com.au>
Jonas Oberschweiber <jonas.oberschweiber@d-velop.de>
larikale
RhodeCode GmbH
Sebastian Kreutzberger <sebastian@rhodecode.com>
Steve Romanow <slestak989@gmail.com>
SteveCohen
Thomas <thomas@rhodecode.com>
Thomas Waldmann <tw-public@gmx.de>
|