kallithea Changeset - 1fed3c9161bb

Changeset - 1fed3c9161bb

Parent rev.

Child rev.

[Not reviewed]

beta

0 3 0

Marcin Kuzminski - 15 years ago 2010-12-29 13:54:03
marcin@python-works.com

fixes #90 + docs update

3 files changed with 43 insertions and 14 deletions:

docs/setup.rst

rhodecode/lib/indexers/__init__.py

rhodecode/lib/indexers/daemon.py

0 comments (0 inline, 0 general)

docs/setup.rst

➞

Show inline comments

 .. _setup:
 Setup
 =====
 Setting up the application
 --------------------------
 First You'll ned to create RhodeCode config file. Run the following command
 to do this
 ::
  paster make-config RhodeCode production.ini
 - This will create `production.ini` config inside the directory
   this config contains various settings for RhodeCode, e.g proxy port,
   email settings, usage of static files, cache, celery settings and logging.
 Next we need to create the database.
 ::
  paster setup-app production.ini
 - This command will create all needed tables and an admin account.
   When asked for a path You can either use a new location of one with already
   existing ones. RhodeCode will simply add all new found repositories to
   it's database. Also make sure You specify correct path to repositories.
 - Remember that the given path for mercurial_ repositories must be write
   accessible for the application. It's very important since RhodeCode web
   interface will work even without such an access but, when trying to do a
   push it'll eventually fail with permission denied errors.
 You are ready to use rhodecode, to run it simply execute
 ::
  paster serve production.ini
 - This command runs the RhodeCode server the app should be available at the
 .0.0.1:5000. This ip and port is configurable via the production.ini
   file created in previous step
 - Use admin account you created to login.
 - Default permissions on each repository is read, and owner is admin. So
   remember to update these if needed. In the admin panel You can toggle ldap,
   anonymous, permissions settings. As well as edit more advanced options on
   users and repositories
 Setting up Whoosh full text search
 ----------------------------------
 Index for whoosh can be build starting from version 1.1 using paster command
 passing repo locations to index, as well as Your config file that stores
 whoosh index files locations. There is possible to pass `-f` to the options
 Starting from version 1.1 whoosh index can be build using paster command.
 You have to specify the config file that stores location of index, and
 location of repositories (`--repo-location`). Starting from version 1.2 it is
 also possible to specify a comma separated list of repositories (`--index-only`)
 to build index only on chooses repositories skipping any other found in repos
 location
 There is possible also to pass `-f` to the options
 to enable full index rebuild. Without that indexing will run always in in
 incremental mode.
 ::
+incremental mode::
  paster make-index production.ini --repo-location=<location for repos>
-for full index rebuild You can use
 ::
+for full index rebuild You can use::
  paster make-index production.ini -f --repo-location=<location for repos>
 - For full text search You can either put crontab entry for
 building index just for chosen repositories is possible with such command::
  paster make-index production.ini --repo-location=<location for repos> --index-only=vcs,rhodecode
 This command can be run even from crontab in order to do periodical
 index builds and keep Your index always up to date. An example entry might
 look like this
 In order to do periodical index builds and keep Your index always up to date.
 It's recommended to do a crontab entry for incremental indexing.
 An example entry might look like this
 ::
  /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos>
 When using incremental(default) mode whoosh will check last modification date
 of each file and add it to reindex if newer file is available. Also indexing
 daemon checks for removed files and removes them from index.
 Sometime You might want to rebuild index from scratch. You can do that using
 the `-f` flag passed to paster command or, in admin panel You can check
 `build from scratch` flag.
 Setting up LDAP support
 -----------------------
 RhodeCode starting from version 1.1 supports ldap authentication. In order
 to use ldap, You have to install python-ldap package. This package is available
 via pypi, so You can install it by running
 ::
  easy_install python-ldap
 ::
  pip install python-ldap
 .. note::
    python-ldap requires some certain libs on Your system, so before installing
    it check that You have at least `openldap`, and `sasl` libraries.
 ldap settings are located in admin->ldap section,
 Here's a typical ldap setup::
  Enable ldap  = checked                 #controls if ldap access is enabled
  Host         = host.domain.org         #actual ldap server to connect
  Port         = 389 or 689 for ldaps    #ldap server ports
  Enable LDAPS = unchecked               #enable disable ldaps
  Account      = <account>               #access for ldap server(if required)
  Password     = <password>              #password for ldap server(if required)
  Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
 `Account` and `Password` are optional, and used for two-phase ldap
 authentication so those are credentials to access Your ldap, if it doesn't
 support anonymous search/user lookups.
 Base DN must have %(user)s template inside, it's a placer where Your uid used
 to login would go, it allows admins to specify not standard schema for uid
 variable
 If all data are entered correctly, and `python-ldap` is properly installed
 Users should be granted to access RhodeCode wit ldap accounts. When
 logging at the first time an special ldap account is created inside RhodeCode,
 so You can control over permissions even on ldap users. If such user exists
 already in RhodeCode database ldap user with the same username would be not
 able to access RhodeCode.
 If You have problems with ldap access and believe You entered correct
 information check out the RhodeCode logs,any error messages sent from
 ldap will be saved there.
 Setting Up Celery
 -----------------
 Since version 1.1 celery is configured by the rhodecode ini configuration files
 simply set use_celery=true in the ini file then add / change the configuration
 variables inside the ini file.
 Remember that the ini files uses format with '.' not with '_' like celery
 so for example setting `BROKER_HOST` in celery means setting `broker.host` in
 the config file.
 In order to make start using celery run::
  paster celeryd <configfile.ini>
 .. note::
    Make sure You run this command from same virtualenv, and with the same user
    that rhodecode runs.
 Nginx virtual host example
 --------------------------
 Sample config for nginx using proxy::
  server {
     listen          80;
     server_name     hg.myserver.com;

rhodecode/lib/indexers/__init__.py

➞

Show inline comments

 import os
 import sys
 import traceback
 from os.path import dirname as dn, join as jn
 #to get the rhodecode import
 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
 from string import strip
 from rhodecode.model import init_model
 from rhodecode.model.scm import ScmModel
 from rhodecode.config.environment import load_environment
 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
 from shutil import rmtree
 from webhelpers.html.builder import escape
 from vcs.utils.lazy import LazyProperty
 from sqlalchemy import engine_from_config
 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
 from whoosh.index import create_in, open_dir
 from whoosh.formats import Characters
 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
 INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
                     'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
                     'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
                     'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
                     'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
                     'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
                     'yaws']
 #CUSTOM ANALYZER wordsplit + lowercase filter
 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
 #INDEX SCHEMA DEFINITION
 SCHEMA = Schema(owner=TEXT(),
                 repository=TEXT(stored=True),
                 path=TEXT(stored=True),
                 content=FieldType(format=Characters(ANALYZER),
                              scorable=True, stored=True),
                 modtime=STORED(), extension=TEXT(stored=True))
 IDX_NAME = 'HG_INDEX'
 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
 FRAGMENTER = SimpleFragmenter(200)
 class MakeIndex(BasePasterCommand):
     max_args = 1
     min_args = 1
     usage = "CONFIG_FILE"
     summary = "Creates index for full text search given configuration file"
     group_name = "RhodeCode"
     takes_config_file = -1
     parser = Command.standard_parser(verbose=True)
     def command(self):
         from pylons import config
         add_cache(config)
         engine = engine_from_config(config, 'sqlalchemy.db1.')
         init_model(engine)
         index_location = config['index_dir']
         repo_location = self.options.repo_location
         repo_list = map(strip, self.options.repo_list.split(','))
         #======================================================================
         # WHOOSH DAEMON
         #======================================================================
         from rhodecode.lib.pidlock import LockHeld, DaemonLock
         from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
         try:
             l = DaemonLock()
             WhooshIndexingDaemon(index_location=index_location,
                                  repo_location=repo_location)\
                                  repo_location=repo_location,
                                  repo_list=repo_list)\
                 .run(full_index=self.options.full_index)
             l.release()
         except LockHeld:
             sys.exit(1)
     def update_parser(self):
         self.parser.add_option('--repo-location',
                           action='store',
                           dest='repo_location',
                           help="Specifies repositories location to index REQUIRED",
+                          )
         self.parser.add_option('--index-only',
                           action='store',
                           dest='repo_list',
                           help="Specifies a comma separated list of repositores "
                                 "to build index on OPTIONAL",
+                          )
         self.parser.add_option('-f',
                           action='store_true',
                           dest='full_index',
                           help="Specifies that index should be made full i.e"
                                 " destroy old and build from scratch",
                           default=False)
 class ResultWrapper(object):
     def __init__(self, search_type, searcher, matcher, highlight_items):
         self.search_type = search_type
         self.searcher = searcher
         self.matcher = matcher
         self.highlight_items = highlight_items
         self.fragment_size = 200 / 2
     @LazyProperty
     def doc_ids(self):
         docs_id = []
         while self.matcher.is_active():
             docnum = self.matcher.id()
             chunks = [offsets for offsets in self.get_chunks()]
             docs_id.append([docnum, chunks])
             self.matcher.next()
         return docs_id
     def __str__(self):
         return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
     def __repr__(self):
         return self.__str__()
     def __len__(self):
         return len(self.doc_ids)
     def __iter__(self):
         """
         Allows Iteration over results,and lazy generate content
         *Requires* implementation of ``__getitem__`` method.
         """
         for docid in self.doc_ids:
             yield self.get_full_content(docid)
     def __getslice__(self, i, j):
         """
         Slicing of resultWrapper
         """
         slice = []
         for docid in self.doc_ids[i:j]:
             slice.append(self.get_full_content(docid))
         return slice
     def get_full_content(self, docid):
         res = self.searcher.stored_fields(docid[0])
         f_path = res['path'][res['path'].find(res['repository']) \
                              + len(res['repository']):].lstrip('/')
         content_short = self.get_short_content(res, docid[1])
         res.update({'content_short':content_short,
                     'content_short_hl':self.highlight(content_short),
                     'f_path':f_path})
         return res
     def get_short_content(self, res, chunks):
         return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
     def get_chunks(self):
         """
         Smart function that implements chunking the content
         but not overlap chunks so it doesn't highlight the same
         close occurrences twice.
         @param matcher:
         @param size:
         """
         memory = [(0, 0)]
         for span in self.matcher.spans():
             start = span.startchar or 0
             end = span.endchar or 0
             start_offseted = max(0, start - self.fragment_size)
             end_offseted = end + self.fragment_size
             if start_offseted < memory[-1][1]:
                 start_offseted = memory[-1][1]
             memory.append((start_offseted, end_offseted,))
             yield (start_offseted, end_offseted,)
     def highlight(self, content, top=5):
         if self.search_type != 'content':
             return ''
         hl = highlight(escape(content),
                  self.highlight_items,
                  analyzer=ANALYZER,
                  fragmenter=FRAGMENTER,

rhodecode/lib/indexers/daemon.py

➞

Show inline comments

 # -*- coding: utf-8 -*-
 """
     rhodecode.lib.indexers.daemon
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     A deamon will read from task table and run tasks
     :created_on: Jan 26, 2010
     :author: marcink
     :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
     :license: GPLv3, see COPYING for more details.
 """
 # This program is free software; you can redistribute it and/or
 # modify it under the terms of the GNU General Public License
 # as published by the Free Software Foundation; version 2
 # of the License or (at your opinion) any later version of the license.
+#
 # This program is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
+#
 # You should have received a copy of the GNU General Public License
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
 # MA  02110-1301, USA.
 import sys
 import os
 import traceback
 from os.path import dirname as dn
 from os.path import join as jn
 #to get the rhodecode import
 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
 sys.path.append(project_path)
 from rhodecode.model.scm import ScmModel
 from rhodecode.lib.helpers import safe_unicode
 from whoosh.index import create_in, open_dir
 from shutil import rmtree
 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
 from time import mktime
 from vcs.exceptions import ChangesetError, RepositoryError
 import logging
 log = logging.getLogger('whooshIndexer')
 # create logger
 log.setLevel(logging.DEBUG)
 log.propagate = False
 # create console handler and set level to debug
 ch = logging.StreamHandler()
 ch.setLevel(logging.DEBUG)
 # create formatter
 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 # add formatter to ch
 ch.setFormatter(formatter)
 # add ch to logger
 log.addHandler(ch)
 class WhooshIndexingDaemon(object):
     """
     Deamon for atomic jobs
     """
     def __init__(self, indexname='HG_INDEX', index_location=None,
                  repo_location=None, sa=None):
+                 repo_location=None, sa=None, repo_list=None):
         self.indexname = indexname
         self.index_location = index_location
         if not index_location:
             raise Exception('You have to provide index location')
         self.repo_location = repo_location
         if not repo_location:
             raise Exception('You have to provide repositories location')
         self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
         if repo_list:
             filtered_repo_paths = {}
             for repo_name, repo in self.repo_paths.items():
                 if repo_name in repo_list:
                     filtered_repo_paths[repo.name] = repo
             self.repo_paths = filtered_repo_paths
         self.initial = False
         if not os.path.isdir(self.index_location):
             os.makedirs(self.index_location)
             log.info('Cannot run incremental index since it does not'
                      ' yet exist running full build')
             self.initial = True
     def get_paths(self, repo):
         """recursive walk in root dir and return a set of all path in that dir
         based on repository walk function
         """
         index_paths_ = set()
         try:
             for topnode, dirs, files in repo.walk('/', 'tip'):
                 for f in files:
                     index_paths_.add(jn(repo.path, f.path))
                 for dir in dirs:
                     for f in files:
                         index_paths_.add(jn(repo.path, f.path))
         except RepositoryError, e:
             log.debug(traceback.format_exc())
             pass
         return index_paths_
     def get_node(self, repo, path):
         n_path = path[len(repo.path) + 1:]
         node = repo.get_changeset().get_node(n_path)
         return node
     def get_node_mtime(self, node):
         return mktime(node.last_changeset.date.timetuple())
     def add_doc(self, writer, path, repo):
         """Adding doc to writer this function itself fetches data from
         the instance of vcs backend"""
         node = self.get_node(repo, path)
         #we just index the content of chosen files, and skip binary files
         if node.extension in INDEX_EXTENSIONS and not node.is_binary:
             u_content = node.content
             if not isinstance(u_content, unicode):
                 log.warning('  >> %s Could not get this content as unicode '
                           'replacing with empty content', path)
                 u_content = u''
             else:
                 log.debug('    >> %s [WITH CONTENT]' % path)
         else:
             log.debug('    >> %s' % path)
             #just index file name without it's content
             u_content = u''
         writer.add_document(owner=unicode(repo.contact),
                         repository=safe_unicode(repo.name),
                         path=safe_unicode(path),
                         content=u_content,
                         modtime=self.get_node_mtime(node),
                         extension=node.extension)
     def build_index(self):
         if os.path.exists(self.index_location):
             log.debug('removing previous index')
             rmtree(self.index_location)
         if not os.path.exists(self.index_location):
             os.mkdir(self.index_location)
         idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
         writer = idx.writer()
         print self.repo_paths.values()
         for cnt, repo in enumerate(self.repo_paths.values()):
         for repo in self.repo_paths.values():
             log.debug('building index @ %s' % repo.path)
             for idx_path in self.get_paths(repo):
                 self.add_doc(writer, idx_path, repo)
         log.debug('>> COMMITING CHANGES <<')
         writer.commit(merge=True)
         log.debug('>>> FINISHED BUILDING INDEX <<<')
     def update_index(self):
         log.debug('STARTING INCREMENTAL INDEXING UPDATE')
         idx = open_dir(self.index_location, indexname=self.indexname)
         # The set of all paths in the index
         indexed_paths = set()
         # The set of all paths we need to re-index
         to_index = set()
         reader = idx.reader()
         writer = idx.writer()
         # Loop over the stored fields in the index
         for fields in reader.all_stored_fields():
             indexed_path = fields['path']
             indexed_paths.add(indexed_path)
             repo = self.repo_paths[fields['repository']]
             try:
                 node = self.get_node(repo, indexed_path)
             except ChangesetError:
                 # This file was deleted since it was indexed
                 log.debug('removing from index %s' % indexed_path)
                 writer.delete_by_term('path', indexed_path)
             else:
                 # Check if this file was changed since it was indexed
                 indexed_time = fields['modtime']
                 mtime = self.get_node_mtime(node)
                 if mtime > indexed_time:
                     # The file has changed, delete it and add it to the list of
                     # files to reindex
                     log.debug('adding to reindex list %s' % indexed_path)
                     writer.delete_by_term('path', indexed_path)
                     to_index.add(indexed_path)
         # Loop over the files in the filesystem
         # Assume we have a function that gathers the filenames of the
         # documents to be indexed
         for repo in self.repo_paths.values():
             for path in self.get_paths(repo):
                 if path in to_index or path not in indexed_paths:
                     # This is either a file that's changed, or a new file
                     # that wasn't indexed before. So index it!
                     self.add_doc(writer, path, repo)
                     log.debug('re indexing %s' % path)
         log.debug('>> COMMITING CHANGES <<')
         writer.commit(merge=True)
         log.debug('>>> FINISHED REBUILDING INDEX <<<')
     def run(self, full_index=False):
         """Run daemon"""
         if full_index or self.initial:
             self.build_index()
         else:
             self.update_index()

0 comments (0 inline, 0 general)