kallithea Changeset - 8b7c0ef62427

Changeset - 8b7c0ef62427

Parent rev.

Child rev.

[Not reviewed]

default

0 4 0

FUJIWARA Katsunori - 9 years ago 2017-01-22 18:17:38
foozy@lares.dti.ne.jp

search: make "repository:" condition work case-insensitively as expected

Before this revision, "repository:" condition at searching for "Commit
messages" never shows revisions in a repository, of which name uses
upper case letter.

Using ID for "repository" of CHGSETS_SCHEMA preserves case of
repository name at indexing. On the other hand, search condition
itself is forcibly lowered before parsing.

- files in repository "FOO" is indexed as "FOO" in "repository" field
- "repository:FOO" condition is treated as "repository:foo:

Then, indexing search itself is executed case-sensitively. Therefore,
"repository:FOO" condition never show revisions in repository "FOO".

But just making "repository" of CHGSETS_SCHEMA case-insensitive isn't
reasonable enough, because it breaks assumptions below, if there is
case-insensitive name collision between repositories, even though
Kallithea itself can manage such repositories at same time.

- combination of "raw_id" (= revision hash ID) and "repository" is
unique between all known revisions under Kallithea

CHGSETS_SCHEMA assumes this.

This unique-ness is required by Whoosh library to determine
whether index table should be updated or not for that repository.

- searching in a repository shows only revisions in that repository

Before this revision, this filtering is achieve by "repository:"
condition with case-preserved repository name from requested URL.

To make "repository:" search condition work case-insensitively as
expected (without any violation of assumptions above), this revision
does:

- make "repository" of CHGSETS_SCHEMA case-insensitive by
"analyzer=ICASEIDANALYZER"

- introduce "repository_rawname" into SCHEMA and CHGSETS_SCHEMA, to
ensure assumptions described above, by preserving case of
repository name

"repository_rawname" of SCHEMA uses not ID but TEXT, because the
former disable "positions" feature, which is required for
highlight-ing file content (see previous revision for detail).

This revision requires full re-building index tables, because indexing
schemas are changed.

4 files changed with 22 insertions and 5 deletions:

kallithea/controllers/search.py

kallithea/lib/indexers/__init__.py

kallithea/lib/indexers/daemon.py

kallithea/tests/functional/test_search_indexing.py

0 comments (0 inline, 0 general)

kallithea/controllers/search.py

➞

Show inline comments

 # -*- coding: utf-8 -*-
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
 # the Free Software Foundation, either version 3 of the License, or
 # (at your option) any later version.
+#
 # This program is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
+#
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 """
 kallithea.controllers.search
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Search controller for Kallithea
 This file was forked by the Kallithea project in July 2014.
 Original author and date, and relevant copyright and licensing information is below:
 :created_on: Aug 7, 2010
 :author: marcink
 :copyright: (c) 2013 RhodeCode GmbH, and others.
 :license: GPLv3, see LICENSE.md for more details.
 """
 import logging
 import traceback
 import urllib
 from pylons.i18n.translation import _
 from pylons import request, config, tmpl_context as c
 from whoosh.index import open_dir, EmptyIndexError
 from whoosh.qparser import QueryParser, QueryParserError
 from whoosh.query import Phrase, Prefix
 from webhelpers.util import update_params
 from kallithea.lib.auth import LoginRequired
 from kallithea.lib.base import BaseRepoController, render
 from kallithea.lib.indexers import CHGSETS_SCHEMA, SCHEMA, CHGSET_IDX_NAME, \
     IDX_NAME, WhooshResultWrapper
 from kallithea.lib.page import Page
 from kallithea.lib.utils2 import safe_str, safe_int
 from kallithea.model.repo import RepoModel
 log = logging.getLogger(__name__)
 class SearchController(BaseRepoController):
     def __before__(self):
         super(SearchController, self).__before__()
     @LoginRequired()
     def index(self, repo_name=None):
         c.repo_name = repo_name
         c.formated_results = []
         c.runtime = ''
         c.cur_query = request.GET.get('q', None)
         c.cur_type = request.GET.get('type', 'content')
         c.cur_search = search_type = {'content': 'content',
                                       'commit': 'message',
                                       'path': 'path',
                                       'repository': 'repository'
                                       }.get(c.cur_type, 'content')
         index_name = {
             'content': IDX_NAME,
             'commit': CHGSET_IDX_NAME,
             'path': IDX_NAME
         }.get(c.cur_type, IDX_NAME)
         schema_defn = {
             'content': SCHEMA,
             'commit': CHGSETS_SCHEMA,
             'path': SCHEMA
         }.get(c.cur_type, SCHEMA)
         log.debug('IDX: %s', index_name)
         log.debug('SCHEMA: %s', schema_defn)
         if c.cur_query:
             cur_query = c.cur_query.lower()
             log.debug(cur_query)
         if c.cur_query:
             p = safe_int(request.GET.get('page'), 1)
             highlight_items = set()
             try:
                 idx = open_dir(config['app_conf']['index_dir'],
                                indexname=index_name)
                 searcher = idx.searcher()
                 qp = QueryParser(search_type, schema=schema_defn)
                 if c.repo_name:
                     cur_query = u'repository:%s %s' % (c.repo_name, cur_query)
                     # use "repository_rawname:" instead of "repository:"
                     # for case-sensitive matching
                     cur_query = u'repository_rawname:%s %s' % (c.repo_name, cur_query)
                 try:
                     query = qp.parse(unicode(cur_query))
                     # extract words for highlight
                     if isinstance(query, Phrase):
                         highlight_items.update(query.words)
                     elif isinstance(query, Prefix):
                         highlight_items.add(query.text)
                     else:
                         for i in query.all_terms():
                             if i[0] in ['content', 'message']:
                                 highlight_items.add(i[1])
                     matcher = query.matcher(searcher)
                     log.debug('query: %s', query)
                     log.debug('hl terms: %s', highlight_items)
                     results = searcher.search(query)
                     res_ln = len(results)
                     c.runtime = '%s results (%.3f seconds)' % (
                         res_ln, results.runtime
+                    )
                     def url_generator(**kw):
                         q = urllib.quote(safe_str(c.cur_query))
                         return update_params("?q=%s&type=%s" \
                         % (q, safe_str(c.cur_type)), **kw)
                     repo_location = RepoModel().repos_path
                     c.formated_results = Page(
                         WhooshResultWrapper(search_type, searcher, matcher,
                                             highlight_items, repo_location),
                         page=p,
                         item_count=res_ln,
                         items_per_page=10,
                         url=url_generator
+                    )
                 except QueryParserError:
                     c.runtime = _('Invalid search query. Try quoting it.')
                 searcher.close()
             except (EmptyIndexError, IOError):
                 log.error(traceback.format_exc())
                 log.error('Empty Index data')
                 c.runtime = _('There is no index to search in. '
                               'Please run whoosh indexer')
             except Exception:
                 log.error(traceback.format_exc())
                 c.runtime = _('An error occurred during search operation.')
         # Return a rendered template
         return render('/search/search.html')

kallithea/lib/indexers/__init__.py

➞

Show inline comments

 # -*- coding: utf-8 -*-
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
 # the Free Software Foundation, either version 3 of the License, or
 # (at your option) any later version.
+#
 # This program is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
+#
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 """
 kallithea.lib.indexers
 ~~~~~~~~~~~~~~~~~~~~~~
 Whoosh indexing module for Kallithea
 This file was forked by the Kallithea project in July 2014.
 Original author and date, and relevant copyright and licensing information is below:
 :created_on: Aug 17, 2010
 :author: marcink
 :copyright: (c) 2013 RhodeCode GmbH, and others.
 :license: GPLv3, see LICENSE.md for more details.
 """
 import os
 import sys
 import logging
 from os.path import dirname
 # Add location of top level folder to sys.path
 sys.path.append(dirname(dirname(dirname(os.path.realpath(__file__)))))
 from whoosh.analysis import RegexTokenizer, LowercaseFilter, IDTokenizer
 from whoosh.fields import TEXT, ID, STORED, NUMERIC, BOOLEAN, Schema, FieldType, DATETIME
 from whoosh.formats import Characters
 from whoosh.highlight import highlight as whoosh_highlight, HtmlFormatter, ContextFragmenter
 from kallithea.lib.utils2 import LazyProperty
 log = logging.getLogger(__name__)
 # CUSTOM ANALYZER wordsplit + lowercase filter
 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
 # CUSTOM ANALYZER raw-string + lowercase filter
+#
 # This is useful to:
 # - avoid tokenization
 # - avoid removing "stop words" from text
 # - search case-insensitively
+#
 ICASEIDANALYZER = IDTokenizer() | LowercaseFilter()
 # CUSTOM ANALYZER raw-string
+#
 # This is useful to:
 # - avoid tokenization
 # - avoid removing "stop words" from text
+#
 IDANALYZER = IDTokenizer()
 #INDEX SCHEMA DEFINITION
 SCHEMA = Schema(
     fileid=ID(unique=True),
     owner=TEXT(),
     # this field preserves case of repository name for exact matching
     repository_rawname=TEXT(analyzer=IDANALYZER),
     repository=TEXT(stored=True, analyzer=ICASEIDANALYZER),
     path=TEXT(stored=True),
     content=FieldType(format=Characters(), analyzer=ANALYZER,
                       scorable=True, stored=True),
     modtime=STORED(),
     extension=TEXT(stored=True)
+)
 IDX_NAME = 'HG_INDEX'
 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
 FRAGMENTER = ContextFragmenter(200)
 CHGSETS_SCHEMA = Schema(
     raw_id=ID(unique=True, stored=True),
     date=NUMERIC(stored=True),
     last=BOOLEAN(),
     owner=TEXT(),
     repository=ID(unique=True, stored=True),
     # this field preserves case of repository name for exact matching
     # and unique-ness in index table
     repository_rawname=ID(unique=True),
     repository=ID(stored=True, analyzer=ICASEIDANALYZER),
     author=TEXT(stored=True),
     message=FieldType(format=Characters(), analyzer=ANALYZER,
                       scorable=True, stored=True),
     parents=TEXT(),
     added=TEXT(),
     removed=TEXT(),
     changed=TEXT(),
+)
 CHGSET_IDX_NAME = 'CHGSET_INDEX'
 # used only to generate queries in journal
 JOURNAL_SCHEMA = Schema(
     username=ID(),
     date=DATETIME(),
     action=TEXT(),
     repository=ID(),
     ip=TEXT(),
+)
 class WhooshResultWrapper(object):
     def __init__(self, search_type, searcher, matcher, highlight_items,
                  repo_location):
         self.search_type = search_type
         self.searcher = searcher
         self.matcher = matcher
         self.highlight_items = highlight_items
         self.fragment_size = 200
         self.repo_location = repo_location
     @LazyProperty
     def doc_ids(self):
         docs_id = []
         while self.matcher.is_active():
             docnum = self.matcher.id()
             chunks = [offsets for offsets in self.get_chunks()]
             docs_id.append([docnum, chunks])
             self.matcher.next()
         return docs_id
     def __str__(self):
         return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
     def __repr__(self):
         return self.__str__()
     def __len__(self):
         return len(self.doc_ids)
     def __iter__(self):
         """
         Allows Iteration over results,and lazy generate content
         *Requires* implementation of ``__getitem__`` method.
         """
         for docid in self.doc_ids:
             yield self.get_full_content(docid)
     def __getitem__(self, key):
         """
         Slicing of resultWrapper
         """
         i, j = key.start, key.stop
         slices = []
         for docid in self.doc_ids[i:j]:
             slices.append(self.get_full_content(docid))
         return slices
     def get_full_content(self, docid):
         res = self.searcher.stored_fields(docid[0])
         log.debug('result: %s', res)
         if self.search_type == 'content':
             full_repo_path = os.path.join(self.repo_location, res['repository'])
             f_path = res['path'].split(full_repo_path)[-1]
             f_path = f_path.lstrip(os.sep)
             content_short = self.get_short_content(res, docid[1])
             res.update({'content_short': content_short,
                         'content_short_hl': self.highlight(content_short),
                         'f_path': f_path
             })
         elif self.search_type == 'path':
             full_repo_path = os.path.join(self.repo_location, res['repository'])
             f_path = res['path'].split(full_repo_path)[-1]
             f_path = f_path.lstrip(os.sep)
             res.update({'f_path': f_path})
         elif self.search_type == 'message':
             res.update({'message_hl': self.highlight(res['message'])})
         log.debug('result: %s', res)
         return res
     def get_short_content(self, res, chunks):

kallithea/lib/indexers/daemon.py

➞

Show inline comments

@@ @@ -110,230 +110,232 @@ class WhooshIndexingDaemon(object): @@
                      'index does not exist')
         else:
             self.initial = False
     def _get_index_revision(self, repo):
         db_repo = Repository.get_by_repo_name(repo.name_unicode)
         landing_rev = 'tip'
         if db_repo:
             _rev_type, _rev = db_repo.landing_rev
             landing_rev = _rev
         return landing_rev
     def _get_index_changeset(self, repo, index_rev=None):
         if not index_rev:
             index_rev = self._get_index_revision(repo)
         cs = repo.get_changeset(index_rev)
         return cs
     def get_paths(self, repo):
         """
         recursive walk in root dir and return a set of all path in that dir
         based on repository walk function
         """
         index_paths_ = set()
         try:
             cs = self._get_index_changeset(repo)
             for _topnode, _dirs, files in cs.walk('/'):
                 for f in files:
                     index_paths_.add(os.path.join(safe_str(repo.path), safe_str(f.path)))
         except RepositoryError:
             log.debug(traceback.format_exc())
             pass
         return index_paths_
     def get_node(self, repo, path, index_rev=None):
         """
         gets a filenode based on given full path. It operates on string for
         hg git compatibility.
         :param repo: scm repo instance
         :param path: full path including root location
         :return: FileNode
         """
         # FIXME: paths should be normalized ... or even better: don't include repo.path
         path = safe_str(path)
         repo_path = safe_str(repo.path)
         assert path.startswith(repo_path)
         assert path[len(repo_path)] in (os.path.sep, os.path.altsep)
         node_path = path[len(repo_path) + 1:]
         cs = self._get_index_changeset(repo, index_rev=index_rev)
         node = cs.get_node(node_path)
         return node
     def is_indexable_node(self, node):
         """
         Just index the content of chosen files, skipping binary files
         """
         return (node.extension in INDEX_EXTENSIONS or node.name in INDEX_FILENAMES) and \
                not node.is_binary
     def get_node_mtime(self, node):
         return mktime(node.last_changeset.date.timetuple())
     def add_doc(self, writer, path, repo, repo_name, index_rev=None):
         """
         Adding doc to writer this function itself fetches data from
         the instance of vcs backend
         """
         try:
             node = self.get_node(repo, path, index_rev)
         except (ChangesetError, NodeDoesNotExistError):
             log.debug("couldn't add doc - %s did not have %r at %s", repo, path, index_rev)
             return 0, 0
         indexed = indexed_w_content = 0
         if self.is_indexable_node(node):
             u_content = node.content
             if not isinstance(u_content, unicode):
                 log.warning('  >> %s Could not get this content as unicode '
                             'replacing with empty content' % path)
                 u_content = u''
             else:
                 log.debug('    >> %s [WITH CONTENT]', path)
                 indexed_w_content += 1
         else:
             log.debug('    >> %s', path)
             # just index file name without it's content
             u_content = u''
             indexed += 1
         p = safe_unicode(path)
         writer.add_document(
             fileid=p,
             owner=unicode(repo.contact),
             repository_rawname=repo.name_unicode,
             repository=safe_unicode(repo_name),
             path=p,
             content=u_content,
             modtime=self.get_node_mtime(node),
             extension=node.extension
+        )
         return indexed, indexed_w_content
     def index_changesets(self, writer, repo_name, repo, start_rev=None):
         """
         Add all changeset in the vcs repo starting at start_rev
         to the index writer
         :param writer: the whoosh index writer to add to
         :param repo_name: name of the repository from whence the
           changeset originates including the repository group
         :param repo: the vcs repository instance to index changesets for,
           the presumption is the repo has changesets to index
         :param start_rev=None: the full sha id to start indexing from
           if start_rev is None then index from the first changeset in
           the repo
         """
         if start_rev is None:
             start_rev = repo[0].raw_id
         log.debug('indexing changesets in %s starting at rev: %s',
                   repo_name, start_rev)
         indexed = 0
         cs_iter = repo.get_changesets(start=start_rev)
         total = len(cs_iter)
         for cs in cs_iter:
             log.debug('    >> %s/%s', cs, total)
             writer.add_document(
                 raw_id=unicode(cs.raw_id),
                 owner=unicode(repo.contact),
                 date=cs._timestamp,
                 repository_rawname=repo.name_unicode,
                 repository=safe_unicode(repo_name),
                 author=cs.author,
                 message=cs.message,
                 last=cs.last,
                 added=u' '.join([safe_unicode(node.path) for node in cs.added]).lower(),
                 removed=u' '.join([safe_unicode(node.path) for node in cs.removed]).lower(),
                 changed=u' '.join([safe_unicode(node.path) for node in cs.changed]).lower(),
                 parents=u' '.join([cs.raw_id for cs in cs.parents]),
+            )
             indexed += 1
         log.debug('indexed %d changesets for repo %s', indexed, repo_name)
         return indexed
     def index_files(self, file_idx_writer, repo_name, repo):
         """
         Index files for given repo_name
         :param file_idx_writer: the whoosh index writer to add to
         :param repo_name: name of the repository we're indexing
         :param repo: instance of vcs repo
         """
         i_cnt = iwc_cnt = 0
         log.debug('building index for %s @revision:%s', repo.path,
                                                 self._get_index_revision(repo))
         index_rev = self._get_index_revision(repo)
         for idx_path in self.get_paths(repo):
             i, iwc = self.add_doc(file_idx_writer, idx_path, repo, repo_name, index_rev)
             i_cnt += i
             iwc_cnt += iwc
         log.debug('added %s files %s with content for repo %s',
                   i_cnt + iwc_cnt, iwc_cnt, repo.path)
         return i_cnt, iwc_cnt
     def update_changeset_index(self):
         idx = open_dir(self.index_location, indexname=CHGSET_IDX_NAME)
         with idx.searcher() as searcher:
             writer = idx.writer()
             writer_is_dirty = False
             try:
                 indexed_total = 0
                 repo_name = None
                 for repo_name, repo in self.repo_paths.items():
                     # skip indexing if there aren't any revs in the repo
                     num_of_revs = len(repo)
                     if num_of_revs < 1:
                         continue
                     qp = QueryParser('repository', schema=CHGSETS_SCHEMA)
                     q = qp.parse(u"last:t AND %s" % repo_name)
                     results = searcher.search(q)
                     # default to scanning the entire repo
                     last_rev = 0
                     start_id = None
                     if len(results) > 0:
                         # assuming that there is only one result, if not this
                         # may require a full re-index.
                         start_id = results[0]['raw_id']
                         last_rev = repo.get_changeset(revision=start_id).revision
                     # there are new changesets to index or a new repo to index
                     if last_rev == 0 or num_of_revs > last_rev + 1:
                         # delete the docs in the index for the previous
                         # last changeset(s)
                         for hit in results:
                             q = qp.parse(u"last:t AND %s AND raw_id:%s" %
                                             (repo_name, hit['raw_id']))
                             writer.delete_by_query(q)
                         # index from the previous last changeset + all new ones
                         indexed_total += self.index_changesets(writer,
                                                 repo_name, repo, start_id)
                         writer_is_dirty = True
                 log.debug('indexed %s changesets for repo %s',
                           indexed_total, repo_name
+                )
             finally:
                 if writer_is_dirty:
                     log.debug('>> COMMITING CHANGES TO CHANGESET INDEX<<')
                     writer.commit(merge=True)
                     log.debug('>>> FINISHED REBUILDING CHANGESET INDEX <<<')
                 else:
                     log.debug('>> NOTHING TO COMMIT TO CHANGESET INDEX<<')
     def update_file_index(self):
         log.debug((u'STARTING INCREMENTAL INDEXING UPDATE FOR EXTENSIONS %s '
                    'AND REPOS %s') % (INDEX_EXTENSIONS, self.repo_paths.keys()))
         idx = open_dir(self.index_location, indexname=self.indexname)
         # The set of all paths in the index
         indexed_paths = set()

kallithea/tests/functional/test_search_indexing.py

➞

Show inline comments

@@ @@ -33,166 +33,166 @@ def init_stopword_test(repo): @@
                                  vcs_type='hg',
                                  parent=prev,
                                  newfile=True)
 repos = [
     # reponame,              init func or fork base, groupname
     (u'indexing_test',       init_indexing_test,     None),
     (u'indexing_test-fork',  u'indexing_test',       None),
     (u'group/indexing_test', u'indexing_test',       u'group'),
     (u'this-is-it',          u'indexing_test',       None),
     (u'indexing_test-foo',   u'indexing_test',       None),
     (u'indexing_test-FOO',   u'indexing_test',       None),
     (u'stopword_test',       init_stopword_test,     None),
+]
 # map: name => id
 repoids = {}
 groupids = {}
 def rebuild_index(full_index):
     with mock.patch('kallithea.lib.indexers.daemon.log.debug',
                     lambda *args, **kwargs: None):
         # The more revisions managed repositories have, the more
         # memory capturing "log.debug()" output in "indexers.daemon"
         # requires. This may cause unintentional failure of subsequent
         # tests, if ENOMEM at forking "git" prevents from rebuilding
         # index for search.
         # Therefore, "log.debug()" is disabled regardless of logging
         # level while rebuilding index.
         # (FYI, ENOMEM occurs at forking "git" with python 2.7.3,
         # Linux 3.2.78-1 x86_64, 3GB memory, and no ulimit
         # configuration for memory)
         create_test_index(TESTS_TMP_PATH, CONFIG, full_index=full_index)
 class TestSearchControllerIndexing(TestController):
     @classmethod
     def setup_class(cls):
         for reponame, init_or_fork, groupname in repos:
             if groupname and groupname not in groupids:
                 group = fixture.create_repo_group(groupname)
                 groupids[groupname] = group.group_id
             if callable(init_or_fork):
                 repo = fixture.create_repo(reponame,
                                            repo_group=groupname)
                 init_or_fork(repo)
             else:
                 repo = fixture.create_fork(init_or_fork, reponame,
                                            repo_group=groupname)
             repoids[reponame] = repo.repo_id
         # treat "it" as indexable filename
         filenames_mock = list(INDEX_FILENAMES)
         filenames_mock.append('it')
         with mock.patch('kallithea.lib.indexers.daemon.INDEX_FILENAMES',
                         filenames_mock):
             rebuild_index(full_index=False) # only for newly added repos
     @classmethod
     def teardown_class(cls):
         # delete in reversed order, to delete fork destination at first
         for reponame, init_or_fork, groupname in reversed(repos):
             RepoModel().delete(repoids[reponame])
         for reponame, init_or_fork, groupname in reversed(repos):
             if groupname in groupids:
                 RepoGroupModel().delete(groupids.pop(groupname),
                                         force_delete=True)
         Session().commit()
         Session.remove()
         rebuild_index(full_index=True) # rebuild fully for subsequent tests
     @parametrize('reponame', [
         (u'indexing_test'),
         (u'indexing_test-fork'),
         (u'group/indexing_test'),
         (u'this-is-it'),
         (u'*-fork'),
         (u'group/*'),
     ])
     @parametrize('searchtype,query,hit', [
         ('content', 'this_should_be_unique_content', 1),
         ('commit', 'this_should_be_unique_commit_log', 1),
         ('path', 'this_should_be_unique_filename.txt', 1),
     ])
     def test_repository_tokenization(self, reponame, searchtype, query, hit):
         self.log_user()
         q = 'repository:%s %s' % (reponame, query)
         response = self.app.get(url(controller='search', action='index'),
                                 {'q': q, 'type': searchtype})
         response.mustcontain('>%d results' % hit)
     @parametrize('searchtype,query,hit', [
-        ('content', 'this_should_be_unique_content', 2),
+        ('content', 'this_should_be_unique_content', 1),
         ('commit', 'this_should_be_unique_commit_log', 1),
-        ('path', 'this_should_be_unique_filename.txt', 2),
+        ('path', 'this_should_be_unique_filename.txt', 1),
     ])
     def test_repository_case_sensitivity(self, searchtype, query, hit):
         self.log_user()
         lname = u'indexing_test-foo'
         uname = u'indexing_test-FOO'
         # (1) "repository:REPONAME" condition should match against
         # repositories case-insensitively
         q = 'repository:%s %s' % (lname, query)
         response = self.app.get(url(controller='search', action='index'),
                                 {'q': q, 'type': searchtype})
         response.mustcontain('>%d results' % hit)
+        response.mustcontain('>%d results' % (hit * 2))
         # (2) on the other hand, searching under the specific
         # repository should return results only for that repository,
         # even if specified name matches against another repository
         # case-insensitively.
         response = self.app.get(url(controller='search', action='index',
                                     repo_name=uname),
                                 {'q': query, 'type': searchtype})
         response.mustcontain('>%d results' % hit)
         # confirm that there is no matching against lower name repository
         assert uname in response
         #assert lname not in response
     @parametrize('searchtype,query,hit', [
         ('content', 'path:this/is/it def test', 37),
         ('commit', 'added:this/is/it bother to ask where', 4),
         # this condition matches against files below, because
         # "path:" condition is also applied on "repository path".
         # - "this/is/it" in "stopword_test" repo
         # - "this_should_be_unique_filename.txt" in "this-is-it" repo
         ('path', 'this/is/it', 0),
         ('content', 'extension:us', 0),
         ('path', 'extension:us', 0),
     ])
     def test_filename_stopword(self, searchtype, query, hit):
         response = self.app.get(url(controller='search', action='index'),
                                 {'q': query, 'type': searchtype})
         response.mustcontain('>%d results' % hit)
     @parametrize('searchtype,query,hit', [
         # matching against both 2 files
         ('content', 'owner:"this is it"', 0),
         ('content', 'owner:this-is-it', 0),
         ('path', 'owner:"this is it"', 0),
         ('path', 'owner:this-is-it', 0),
         # matching against both 2 revisions
         ('commit', 'owner:"this is it"', 0),
         ('commit', 'owner:"this-is-it"', 0),
         # matching against only 1 revision
         ('commit', 'author:"this is it"', 0),
         ('commit', 'author:"this-is-it"', 0),
     ])
     def test_mailaddr_stopword(self, searchtype, query, hit):
         response = self.app.get(url(controller='search', action='index'),
                                 {'q': query, 'type': searchtype})
         response.mustcontain('>%d results' % hit)

0 comments (0 inline, 0 general)