Changeset - 1fed3c9161bb
[Not reviewed]
beta
0 3 0
Marcin Kuzminski - 15 years ago 2010-12-29 13:54:03
marcin@python-works.com
fixes #90 + docs update
3 files changed with 44 insertions and 15 deletions:
0 comments (0 inline, 0 general)
docs/setup.rst
Show inline comments
 
@@ -9,123 +9,132 @@ Setting up the application
 

	
 
First You'll ned to create RhodeCode config file. Run the following command 
 
to do this
 

	
 
::
 
 
 
 paster make-config RhodeCode production.ini
 

	
 
- This will create `production.ini` config inside the directory
 
  this config contains various settings for RhodeCode, e.g proxy port, 
 
  email settings, usage of static files, cache, celery settings and logging.
 

	
 

	
 

	
 
Next we need to create the database.
 

	
 
::
 

	
 
 paster setup-app production.ini
 

	
 
- This command will create all needed tables and an admin account. 
 
  When asked for a path You can either use a new location of one with already 
 
  existing ones. RhodeCode will simply add all new found repositories to 
 
  it's database. Also make sure You specify correct path to repositories.
 
- Remember that the given path for mercurial_ repositories must be write 
 
  accessible for the application. It's very important since RhodeCode web 
 
  interface will work even without such an access but, when trying to do a 
 
  push it'll eventually fail with permission denied errors. 
 

	
 
You are ready to use rhodecode, to run it simply execute
 

	
 
::
 
 
 
 paster serve production.ini
 
 
 
- This command runs the RhodeCode server the app should be available at the 
 
  127.0.0.1:5000. This ip and port is configurable via the production.ini 
 
  file created in previous step
 
- Use admin account you created to login.
 
- Default permissions on each repository is read, and owner is admin. So 
 
  remember to update these if needed. In the admin panel You can toggle ldap,
 
  anonymous, permissions settings. As well as edit more advanced options on 
 
  users and repositories
 
  
 
    
 
Setting up Whoosh full text search
 
----------------------------------
 

	
 
Index for whoosh can be build starting from version 1.1 using paster command
 
passing repo locations to index, as well as Your config file that stores
 
whoosh index files locations. There is possible to pass `-f` to the options
 
Starting from version 1.1 whoosh index can be build using paster command.
 
You have to specify the config file that stores location of index, and
 
location of repositories (`--repo-location`). Starting from version 1.2 it is 
 
also possible to specify a comma separated list of repositories (`--index-only`)
 
to build index only on chooses repositories skipping any other found in repos
 
location
 

	
 
There is possible also to pass `-f` to the options
 
to enable full index rebuild. Without that indexing will run always in in
 
incremental mode.
 

	
 
::
 
incremental mode::
 

	
 
 paster make-index production.ini --repo-location=<location for repos> 
 

	
 
for full index rebuild You can use
 

	
 

	
 
::
 
for full index rebuild You can use::
 

	
 
 paster make-index production.ini -f --repo-location=<location for repos>
 

	
 
- For full text search You can either put crontab entry for
 

	
 
building index just for chosen repositories is possible with such command::
 
 
 
 paster make-index production.ini --repo-location=<location for repos> --index-only=vcs,rhodecode
 

	
 
This command can be run even from crontab in order to do periodical 
 
index builds and keep Your index always up to date. An example entry might 
 
look like this
 

	
 
In order to do periodical index builds and keep Your index always up to date.
 
It's recommended to do a crontab entry for incremental indexing. 
 
An example entry might look like this
 

	
 
::
 
 
 
 /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos> 
 
  
 
When using incremental(default) mode whoosh will check last modification date 
 
When using incremental (default) mode whoosh will check last modification date 
 
of each file and add it to reindex if newer file is available. Also indexing 
 
daemon checks for removed files and removes them from index. 
 

	
 
Sometime You might want to rebuild index from scratch. You can do that using 
 
the `-f` flag passed to paster command or, in admin panel You can check 
 
`build from scratch` flag.
 

	
 

	
 
Setting up LDAP support
 
-----------------------
 

	
 
RhodeCode starting from version 1.1 supports ldap authentication. In order
 
to use ldap, You have to install python-ldap package. This package is available
 
via pypi, so You can install it by running
 

	
 
::
 

	
 
 easy_install python-ldap
 
 
 
::
 

	
 
 pip install python-ldap
 

	
 
.. note::
 
   python-ldap requires some certain libs on Your system, so before installing 
 
   it check that You have at least `openldap`, and `sasl` libraries.
 

	
 
ldap settings are located in admin->ldap section,
 

	
 
Here's a typical ldap setup::
 

	
 
 Enable ldap  = checked                 #controls if ldap access is enabled
 
 Host         = host.domain.org         #actual ldap server to connect
 
 Port         = 389 or 689 for ldaps    #ldap server ports
 
 Enable LDAPS = unchecked               #enable disable ldaps
 
 Account      = <account>               #access for ldap server(if required)
 
 Password     = <password>              #password for ldap server(if required)
 
 Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
 
 
 

	
 
`Account` and `Password` are optional, and used for two-phase ldap 
 
authentication so those are credentials to access Your ldap, if it doesn't 
 
support anonymous search/user lookups. 
 

	
 
Base DN must have %(user)s template inside, it's a placer where Your uid used
 
to login would go, it allows admins to specify not standard schema for uid 
 
variable
 

	
rhodecode/lib/indexers/__init__.py
Show inline comments
 
import os
 
import sys
 
import traceback
 
from os.path import dirname as dn, join as jn
 

	
 
#to get the rhodecode import
 
sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
 

	
 
from string import strip
 

	
 
from rhodecode.model import init_model
 
from rhodecode.model.scm import ScmModel
 
from rhodecode.config.environment import load_environment
 
from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
 

	
 
from shutil import rmtree
 
from webhelpers.html.builder import escape
 
from vcs.utils.lazy import LazyProperty
 

	
 
from sqlalchemy import engine_from_config
 

	
 
from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
 
from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
 
from whoosh.index import create_in, open_dir
 
from whoosh.formats import Characters
 
from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
 

	
 

	
 
#EXTENSIONS WE WANT TO INDEX CONTENT OFF
 
INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
 
                    'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
 
                    'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
 
                    'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
 
                    'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
 
                    'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
 
                    'yaws']
 

	
 
#CUSTOM ANALYZER wordsplit + lowercase filter
 
ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
 

	
 

	
 
#INDEX SCHEMA DEFINITION
 
SCHEMA = Schema(owner=TEXT(),
 
                repository=TEXT(stored=True),
 
                path=TEXT(stored=True),
 
                content=FieldType(format=Characters(ANALYZER),
 
                             scorable=True, stored=True),
 
                modtime=STORED(), extension=TEXT(stored=True))
 

	
 

	
 
IDX_NAME = 'HG_INDEX'
 
FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
 
FRAGMENTER = SimpleFragmenter(200)
 

	
 

	
 
class MakeIndex(BasePasterCommand):
 

	
 
    max_args = 1
 
    min_args = 1
 

	
 
    usage = "CONFIG_FILE"
 
    summary = "Creates index for full text search given configuration file"
 
    group_name = "RhodeCode"
 
    takes_config_file = -1
 
    parser = Command.standard_parser(verbose=True)
 

	
 
    def command(self):
 

	
 
        from pylons import config
 
        add_cache(config)
 
        engine = engine_from_config(config, 'sqlalchemy.db1.')
 
        init_model(engine)
 

	
 
        index_location = config['index_dir']
 
        repo_location = self.options.repo_location
 
        repo_list = map(strip, self.options.repo_list.split(','))
 

	
 
        #======================================================================
 
        # WHOOSH DAEMON
 
        #======================================================================
 
        from rhodecode.lib.pidlock import LockHeld, DaemonLock
 
        from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
 
        try:
 
            l = DaemonLock()
 
            WhooshIndexingDaemon(index_location=index_location,
 
                                 repo_location=repo_location)\
 
                                 repo_location=repo_location,
 
                                 repo_list=repo_list)\
 
                .run(full_index=self.options.full_index)
 
            l.release()
 
        except LockHeld:
 
            sys.exit(1)
 

	
 
    def update_parser(self):
 
        self.parser.add_option('--repo-location',
 
                          action='store',
 
                          dest='repo_location',
 
                          help="Specifies repositories location to index REQUIRED",
 
                          )
 
        self.parser.add_option('--index-only',
 
                          action='store',
 
                          dest='repo_list',
 
                          help="Specifies a comma separated list of repositores "
 
                                "to build index on OPTIONAL",
 
                          )
 
        self.parser.add_option('-f',
 
                          action='store_true',
 
                          dest='full_index',
 
                          help="Specifies that index should be made full i.e"
 
                                " destroy old and build from scratch",
 
                          default=False)
 

	
 
class ResultWrapper(object):
 
    def __init__(self, search_type, searcher, matcher, highlight_items):
 
        self.search_type = search_type
 
        self.searcher = searcher
 
        self.matcher = matcher
 
        self.highlight_items = highlight_items
 
        self.fragment_size = 200 / 2
 

	
 
    @LazyProperty
 
    def doc_ids(self):
 
        docs_id = []
 
        while self.matcher.is_active():
 
            docnum = self.matcher.id()
 
            chunks = [offsets for offsets in self.get_chunks()]
 
            docs_id.append([docnum, chunks])
 
            self.matcher.next()
 
        return docs_id
 

	
 
    def __str__(self):
 
        return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
 

	
 
    def __repr__(self):
 
        return self.__str__()
 

	
 
    def __len__(self):
 
        return len(self.doc_ids)
 

	
 
    def __iter__(self):
 
        """
 
        Allows Iteration over results,and lazy generate content
 

	
 
        *Requires* implementation of ``__getitem__`` method.
 
        """
 
        for docid in self.doc_ids:
 
            yield self.get_full_content(docid)
 

	
 
    def __getslice__(self, i, j):
 
        """
 
        Slicing of resultWrapper
 
        """
 
        slice = []
rhodecode/lib/indexers/daemon.py
Show inline comments
 
@@ -25,182 +25,192 @@
 
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
 
# MA  02110-1301, USA.
 

	
 
import sys
 
import os
 
import traceback
 
from os.path import dirname as dn
 
from os.path import join as jn
 

	
 
#to get the rhodecode import
 
project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
 
sys.path.append(project_path)
 

	
 

	
 
from rhodecode.model.scm import ScmModel
 
from rhodecode.lib.helpers import safe_unicode
 
from whoosh.index import create_in, open_dir
 
from shutil import rmtree
 
from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
 

	
 
from time import mktime
 
from vcs.exceptions import ChangesetError, RepositoryError
 

	
 
import logging
 

	
 
log = logging.getLogger('whooshIndexer')
 
# create logger
 
log.setLevel(logging.DEBUG)
 
log.propagate = False
 
# create console handler and set level to debug
 
ch = logging.StreamHandler()
 
ch.setLevel(logging.DEBUG)
 

	
 
# create formatter
 
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 

	
 
# add formatter to ch
 
ch.setFormatter(formatter)
 

	
 
# add ch to logger
 
log.addHandler(ch)
 

	
 
class WhooshIndexingDaemon(object):
 
    """
 
    Deamon for atomic jobs
 
    """
 

	
 
    def __init__(self, indexname='HG_INDEX', index_location=None,
 
                 repo_location=None, sa=None):
 
                 repo_location=None, sa=None, repo_list=None):
 
        self.indexname = indexname
 

	
 
        self.index_location = index_location
 
        if not index_location:
 
            raise Exception('You have to provide index location')
 

	
 
        self.repo_location = repo_location
 
        if not repo_location:
 
            raise Exception('You have to provide repositories location')
 

	
 
        self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
 

	
 
        if repo_list:
 
            filtered_repo_paths = {}
 
            for repo_name, repo in self.repo_paths.items():
 
                if repo_name in repo_list:
 
                    filtered_repo_paths[repo.name] = repo
 

	
 
            self.repo_paths = filtered_repo_paths
 

	
 

	
 
        self.initial = False
 
        if not os.path.isdir(self.index_location):
 
            os.makedirs(self.index_location)
 
            log.info('Cannot run incremental index since it does not'
 
                     ' yet exist running full build')
 
            self.initial = True
 

	
 
    def get_paths(self, repo):
 
        """recursive walk in root dir and return a set of all path in that dir
 
        based on repository walk function
 
        """
 
        index_paths_ = set()
 
        try:
 
            for topnode, dirs, files in repo.walk('/', 'tip'):
 
                for f in files:
 
                    index_paths_.add(jn(repo.path, f.path))
 
                for dir in dirs:
 
                    for f in files:
 
                        index_paths_.add(jn(repo.path, f.path))
 

	
 
        except RepositoryError, e:
 
            log.debug(traceback.format_exc())
 
            pass
 
        return index_paths_
 

	
 
    def get_node(self, repo, path):
 
        n_path = path[len(repo.path) + 1:]
 
        node = repo.get_changeset().get_node(n_path)
 
        return node
 

	
 
    def get_node_mtime(self, node):
 
        return mktime(node.last_changeset.date.timetuple())
 

	
 
    def add_doc(self, writer, path, repo):
 
        """Adding doc to writer this function itself fetches data from
 
        the instance of vcs backend"""
 
        node = self.get_node(repo, path)
 

	
 
        #we just index the content of chosen files, and skip binary files
 
        if node.extension in INDEX_EXTENSIONS and not node.is_binary:
 

	
 
            u_content = node.content
 
            if not isinstance(u_content, unicode):
 
                log.warning('  >> %s Could not get this content as unicode '
 
                          'replacing with empty content', path)
 
                u_content = u''
 
            else:
 
                log.debug('    >> %s [WITH CONTENT]' % path)
 

	
 
        else:
 
            log.debug('    >> %s' % path)
 
            #just index file name without it's content
 
            u_content = u''
 

	
 
        writer.add_document(owner=unicode(repo.contact),
 
                        repository=safe_unicode(repo.name),
 
                        path=safe_unicode(path),
 
                        content=u_content,
 
                        modtime=self.get_node_mtime(node),
 
                        extension=node.extension)
 

	
 

	
 
    def build_index(self):
 
        if os.path.exists(self.index_location):
 
            log.debug('removing previous index')
 
            rmtree(self.index_location)
 

	
 
        if not os.path.exists(self.index_location):
 
            os.mkdir(self.index_location)
 

	
 
        idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
 
        writer = idx.writer()
 
        print self.repo_paths.values()
 
        for cnt, repo in enumerate(self.repo_paths.values()):
 

	
 
        for repo in self.repo_paths.values():
 
            log.debug('building index @ %s' % repo.path)
 

	
 
            for idx_path in self.get_paths(repo):
 
                self.add_doc(writer, idx_path, repo)
 

	
 
        log.debug('>> COMMITING CHANGES <<')
 
        writer.commit(merge=True)
 
        log.debug('>>> FINISHED BUILDING INDEX <<<')
 

	
 

	
 
    def update_index(self):
 
        log.debug('STARTING INCREMENTAL INDEXING UPDATE')
 

	
 
        idx = open_dir(self.index_location, indexname=self.indexname)
 
        # The set of all paths in the index
 
        indexed_paths = set()
 
        # The set of all paths we need to re-index
 
        to_index = set()
 

	
 
        reader = idx.reader()
 
        writer = idx.writer()
 

	
 
        # Loop over the stored fields in the index
 
        for fields in reader.all_stored_fields():
 
            indexed_path = fields['path']
 
            indexed_paths.add(indexed_path)
 

	
 
            repo = self.repo_paths[fields['repository']]
 

	
 
            try:
 
                node = self.get_node(repo, indexed_path)
 
            except ChangesetError:
 
                # This file was deleted since it was indexed
 
                log.debug('removing from index %s' % indexed_path)
 
                writer.delete_by_term('path', indexed_path)
 

	
 
            else:
 
                # Check if this file was changed since it was indexed
 
                indexed_time = fields['modtime']
 
                mtime = self.get_node_mtime(node)
 
                if mtime > indexed_time:
 
                    # The file has changed, delete it and add it to the list of
 
                    # files to reindex
 
                    log.debug('adding to reindex list %s' % indexed_path)
 
                    writer.delete_by_term('path', indexed_path)
 
                    to_index.add(indexed_path)
 

	
 
        # Loop over the files in the filesystem
0 comments (0 inline, 0 general)