Changeset - aac24db58ce8
[Not reviewed]
beta
0 4 0
Marcin Kuzminski - 15 years ago 2010-11-27 01:43:04
marcin@python-works.com
fixed cache problem,
updated docs
4 files changed with 46 insertions and 4 deletions:
0 comments (0 inline, 0 general)
docs/changelog.rst
Show inline comments
 
.. _changelog:
 

	
 
Changelog
 
=========
 

	
 
1.1.0 (**2010-XX-XX**)
 
----------------------
 

	
 
:status: in-progress
 
:branch: beta
 

	
 
news
 
++++
 

	
 
- rewrite of internals for vcs >=0.1.10
 
- anonymous access, authentication via ldap
 
- performance upgrade for cached repos list - each repository has it's own 
 
  cache that's invalidated when needed.
 
- main page quick filter for filtering repositories
 
- user dashboards with ability to follow chosen repositories actions
 
- sends email to admin on new user registration
 
- added cache/statistics reset options into repository settings
 
- more detailed action logger (based on hooks) with pushed changesets lists
 
  and options to disable those hooks from admin panel
 
- introduced new enhanced changelog for merges that shows more accurate results
 
- gui optimizations, fixed application width to 1024px
 
- whoosh,celeryd,upgrade moved to paster command
 
- whoosh, celeryd, upgrade moved to paster command
 

	
 
fixes
 
+++++
 

	
 
- fixes #61 forked repo was showing only after cache expired
 
- fixes #76 no confirmation on user deletes
 
- fixes #66 Name field misspelled
 
- fixes #72 block user removal when he owns repositories
 
- fixes #69 added password confirmation fields
 
- numerous small bugfixes
 
- a lot of fixes and tweaks for file browser
 
- fixed detached session issues
 

	
 
(special thanks for TkSoh for detailed feedback)
 

	
 

	
 
1.0.2 (**2010-11-12**)
 
----------------------
 

	
 
news
 
++++
 

	
 
- tested under python2.7
 
- bumped sqlalchemy and celery versions
 

	
 
fixes
 
+++++
 

	
 
- fixed #59 missing graph.js
 
- fixed repo_size crash when repository had broken symlinks
 
- fixed python2.5 crashes.
 

	
 

	
 
1.0.1 (**2010-11-10**)
 
----------------------
 

	
 
news
 
++++
 

	
 
- small css updated
 

	
 
fixes
 
+++++
 

	
 
- fixed #53 python2.5 incompatible enumerate calls
 
- fixed #52 disable mercurial extension for web
 
- fixed #51 deleting repositories don't delete it's dependent objects
 

	
 

	
 
1.0.0 (**2010-11-02**)
 
----------------------
 

	
 
- security bugfix simplehg wasn't checking for permissions on commands
 
  other than pull or push.
 
- fixed doubled messages after push or pull in admin journal
 
- templating and css corrections, fixed repo switcher on chrome, updated titles
 
- admin menu accessible from options menu on repository view
 
- permissions cached queries
 

	
 
1.0.0rc4  (**2010-10-12**)
 
--------------------------
 

	
 
- fixed python2.5 missing simplejson imports (thanks to Jens Bäckman)
 
- removed cache_manager settings from sqlalchemy meta
 
- added sqlalchemy cache settings to ini files
 
- validated password length and added second try of failure on paster setup-app
 
- fixed setup database destroy prompt even when there was no db
 

	
 

	
 
1.0.0rc3 (**2010-10-11**)
 
-------------------------
 

	
 
- fixed i18n during installation.
 

	
 
1.0.0rc2 (**2010-10-11**)
 
-------------------------
 

	
 
- Disabled dirsize in file browser, it's causing nasty bug when dir renames 
 
  occure. After vcs is fixed it'll be put back again.
 
- templating/css rewrites, optimized css.
 

	
docs/setup.rst
Show inline comments
 
@@ -38,192 +38,208 @@ Setting up the application
 
  file  created in previous step
 
- Use admin account you created to login.
 
- Default permissions on each repository is read, and owner is admin. So 
 
  remember to update these if needed.
 
  
 
    
 
Setting up Whoosh full text search
 
----------------------------------
 

	
 
Index for whoosh can be build starting from version 1.1 using paster command
 
passing repo locations to index, as well as Your config file that stores
 
whoosh index files locations. There is possible to pass `-f` to the options
 
to enable full index rebuild. Without that indexing will run always in in
 
incremental mode.
 

	
 
::
 

	
 
 paster make-index --repo-location=<location for repos> production.ini  
 

	
 
for full index rebuild You can use
 

	
 
::
 

	
 
 paster make-index -f --repo-location=<location for repos> production.ini
 

	
 
- For full text search You can either put crontab entry for
 

	
 
This command can be run even from crontab in order to do periodical 
 
index builds and keep Your index always up to date. An example entry might 
 
look like this
 

	
 
::
 
 
 
 /path/to/python/bin/paster --repo-location=<location for repos> /path/to/rhodecode/production.ini
 
  
 
When using incremental(default) mode whoosh will check last modification date 
 
of each file and add it to reindex if newer file is available. Also indexing 
 
daemon checks for removed files and removes them from index. 
 

	
 
Sometime You might want to rebuild index from scratch. You can do that using 
 
the `-f` flag passed to paster command or, in admin panel You can check 
 
`build from scratch` flag.
 

	
 

	
 
Setting up LDAP support
 
-----------------------
 

	
 
RhodeCode starting from version 1.1 supports ldap authentication. In order
 
to use ldap, You have to install python-ldap package. This package is available
 
via pypi, so You can install it by running
 

	
 
::
 

	
 
 easy_install python-ldap
 
 
 
::
 

	
 
 pip install python-ldap
 

	
 
.. note::
 
   python-ldap requires some certain libs on Your system, so before installing 
 
   it check that You have at least `openldap`, and `sasl` libraries.
 

	
 
ldap settings are located in admin->ldap section,
 

	
 
Here's a typical ldap setup::
 

	
 
 Enable ldap  = checked                 #controls if ldap access is enabled
 
 Host         = host.domain.org         #actual ldap server to connect
 
 Port         = 389 or 689 for ldaps    #ldap server ports
 
 Enable LDAPS = unchecked               #enable disable ldaps
 
 Account      = <account>               #access for ldap server(if required)
 
 Password     = <password>              #password for ldap server(if required)
 
 Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
 
 
 

	
 
`Account` and `Password` are optional, and used for two-phase ldap 
 
authentication so those are credentials to access Your ldap, if it doesn't 
 
support anonymous search/user lookups. 
 

	
 
Base DN must have %(user)s template inside, it's a placer where Your uid used
 
to login would go, it allows admins to specify not standard schema for uid 
 
variable
 

	
 
If all data are entered correctly, and `python-ldap` is properly installed
 
Users should be granted to access RhodeCode wit ldap accounts. When 
 
logging at the first time an special ldap account is created inside RhodeCode, 
 
so You can control over permissions even on ldap users. If such user exists 
 
already in RhodeCode database ldap user with the same username would be not 
 
able to access RhodeCode.
 

	
 
If You have problems with ldap access and believe You entered correct 
 
information check out the RhodeCode logs,any error messages sent from 
 
ldap will be saved there.
 

	
 

	
 

	
 
Setting Up Celery
 
-----------------
 

	
 
Since version 1.1 celery is configured by the rhodecode ini configuration files
 
simply set use_celery=true in the ini file then add / change the configuration 
 
variables inside the ini file.
 

	
 
Remember that the ini files uses format with '.' not with '_' like celery
 
so for example setting `BROKER_HOST` in celery means setting `broker.host` in
 
the config file.
 

	
 
In order to make start using celery run::
 
 paster celeryd <configfile.ini>
 

	
 

	
 
Nginx virtual host example
 
--------------------------
 

	
 
Sample config for nginx using proxy::
 

	
 
 server {
 
    listen          80;
 
    server_name     hg.myserver.com;
 
    access_log      /var/log/nginx/rhodecode.access.log;
 
    error_log       /var/log/nginx/rhodecode.error.log;
 
    location / {
 
            root /var/www/rhodecode/rhodecode/public/;
 
            if (!-f $request_filename){
 
                proxy_pass      http://127.0.0.1:5000;
 
            }
 
            #this is important for https !!!
 
            proxy_set_header X-Url-Scheme $scheme;
 
            include         /etc/nginx/proxy.conf;  
 
    }
 
 }  
 
  
 
Here's the proxy.conf. It's tuned so it'll not timeout on long
 
pushes and also on large pushes::
 

	
 
    proxy_redirect              off;
 
    proxy_set_header            Host $host;
 
    proxy_set_header            X-Host $http_host;
 
    proxy_set_header            X-Real-IP $remote_addr;
 
    proxy_set_header            X-Forwarded-For $proxy_add_x_forwarded_for;
 
    proxy_set_header            Proxy-host $proxy_host;
 
    client_max_body_size        400m;
 
    client_body_buffer_size     128k;
 
    proxy_buffering             off;
 
    proxy_connect_timeout       3600;
 
    proxy_send_timeout          3600;
 
    proxy_read_timeout          3600;
 
    proxy_buffer_size           8k;
 
    proxy_buffers               8 32k;
 
    proxy_busy_buffers_size     64k;
 
    proxy_temp_file_write_size  64k;
 
 
 
Also when using root path with nginx You might set the static files to false
 
in production.ini file::
 

	
 
  [app:main]
 
    use = egg:rhodecode
 
    full_stack = true
 
    static_files = false
 
    lang=en
 
    cache_dir = %(here)s/data
 

	
 
To not have the statics served by the application. And improve speed.
 

	
 
Apache reverse proxy
 
--------------------
 
Tutorial can be found here
 
http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
 

	
 

	
 
Apache's example FCGI config
 
----------------------------
 

	
 
TODO !
 

	
 
Other configuration files
 
-------------------------
 

	
 
Some extra configuration files and examples can be found here:
 
http://hg.python-works.com/rhodecode/files/tip/init.d
 

	
 
and also an celeryconfig file can be use from here:
 
http://hg.python-works.com/rhodecode/files/tip/celeryconfig.py
 

	
 
Troubleshooting
 
---------------
 

	
 
- missing static files ?
 

	
 
 - make sure either to set the `static_files = true` in the .ini file or
 
   double check the root path for Your http setup. It should point to 
 
   for example:
 
   /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
 
   
 
- can't install celery/rabbitmq
 

	
 
 - don't worry RhodeCode works without them too. No extra setup required
 

	
 

	
 
- long lasting push timeouts ?
 

	
 
 - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
 
   are caused by https server and not RhodeCode
 

	
 
- large pushes timeouts ?
 
 
 
 - make sure You set a proper max_body_size for the http server
rhodecode/lib/celerylib/tasks.py
Show inline comments
 
from celery.decorators import task
 

	
 
import os
 
import traceback
 
import beaker
 
from time import mktime
 
from operator import itemgetter
 

	
 
from pylons import config
 
from pylons.i18n.translation import _
 

	
 
from rhodecode.lib.celerylib import run_task, locked_task, str2bool
 
from rhodecode.lib.helpers import person
 
from rhodecode.lib.smtp_mailer import SmtpMailer
 
from rhodecode.lib.utils import OrderedDict
 
from rhodecode.model import init_model
 
from rhodecode.model import meta
 
from rhodecode.model.db import RhodeCodeUi
 

	
 
from vcs.backends import get_repo
 

	
 
from sqlalchemy import engine_from_config
 

	
 
#set cache regions for beaker so celery can utilise it
 
def add_cache(settings):
 
    cache_settings = {'regions':None}
 
    for key in settings.keys():
 
        for prefix in ['beaker.cache.', 'cache.']:
 
            if key.startswith(prefix):
 
                name = key.split(prefix)[1].strip()
 
                cache_settings[name] = settings[key].strip()
 
    if cache_settings['regions']:
 
        for region in cache_settings['regions'].split(','):
 
            region = region.strip()
 
            region_settings = {}
 
            for key, value in cache_settings.items():
 
                if key.startswith(region):
 
                    region_settings[key.split('.')[1]] = value
 
            region_settings['expire'] = int(region_settings.get('expire',
 
                                                                60))
 
            region_settings.setdefault('lock_dir',
 
                                       cache_settings.get('lock_dir'))
 
            if 'type' not in region_settings:
 
                region_settings['type'] = cache_settings.get('type',
 
                                                             'memory')
 
            beaker.cache.cache_regions[region] = region_settings
 
add_cache(config)
 

	
 
try:
 
    import json
 
except ImportError:
 
    #python 2.5 compatibility
 
    import simplejson as json
 

	
 
__all__ = ['whoosh_index', 'get_commits_stats',
 
           'reset_user_password', 'send_email']
 

	
 
CELERY_ON = str2bool(config['app_conf'].get('use_celery'))
 

	
 
def get_session():
 
    if CELERY_ON:
 
        engine = engine_from_config(config, 'sqlalchemy.db1.')
 
        init_model(engine)
 
    sa = meta.Session()
 
    return sa
 

	
 
def get_repos_path():
 
    sa = get_session()
 
    q = sa.query(RhodeCodeUi).filter(RhodeCodeUi.ui_key == '/').one()
 
    return q.ui_value
 

	
 
@task
 
@locked_task
 
def whoosh_index(repo_location, full_index):
 
    log = whoosh_index.get_logger()
 
    from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
 
    index_location = config['index_dir']
 
    WhooshIndexingDaemon(index_location=index_location,
 
                         repo_location=repo_location).run(full_index=full_index)
 
                         repo_location=repo_location, sa=get_session())\
 
                         .run(full_index=full_index)
 

	
 
@task
 
@locked_task
 
def get_commits_stats(repo_name, ts_min_y, ts_max_y):
 
    from rhodecode.model.db import Statistics, Repository
 
    log = get_commits_stats.get_logger()
 

	
 
    #for js data compatibilty
 
    author_key_cleaner = lambda k: person(k).replace('"', "")
 

	
 
    commits_by_day_author_aggregate = {}
 
    commits_by_day_aggregate = {}
 
    repos_path = get_repos_path()
 
    p = os.path.join(repos_path, repo_name)
 
    repo = get_repo(p)
 

	
 
    skip_date_limit = True
 
    parse_limit = 250 #limit for single task changeset parsing optimal for
 
    last_rev = 0
 
    last_cs = None
 
    timegetter = itemgetter('time')
 

	
 
    sa = get_session()
 

	
 
    dbrepo = sa.query(Repository)\
 
        .filter(Repository.repo_name == repo_name).scalar()
 
    cur_stats = sa.query(Statistics)\
 
        .filter(Statistics.repository == dbrepo).scalar()
 
    if cur_stats:
 
        last_rev = cur_stats.stat_on_revision
 
    if not repo.revisions:
 
        return True
 

	
 
    if last_rev == repo.revisions[-1] and len(repo.revisions) > 1:
 
        #pass silently without any work if we're not on first revision or 
 
        #current state of parsing revision(from db marker) is the last revision
 
        return True
 

	
 
    if cur_stats:
 
        commits_by_day_aggregate = OrderedDict(
 
                                       json.loads(
 
                                        cur_stats.commit_activity_combined))
 
        commits_by_day_author_aggregate = json.loads(cur_stats.commit_activity)
 

	
 
    log.debug('starting parsing %s', parse_limit)
 
    lmktime = mktime
 

	
 
    for cnt, rev in enumerate(repo.revisions[last_rev:]):
 
        last_cs = cs = repo.get_changeset(rev)
 
        k = '%s-%s-%s' % (cs.date.timetuple()[0], cs.date.timetuple()[1],
 
                          cs.date.timetuple()[2])
 
        timetupple = [int(x) for x in k.split('-')]
 
        timetupple.extend([0 for _ in xrange(6)])
 
        k = lmktime(timetupple)
 
        if commits_by_day_author_aggregate.has_key(author_key_cleaner(cs.author)):
 
            try:
 
                l = [timegetter(x) for x in commits_by_day_author_aggregate\
 
                        [author_key_cleaner(cs.author)]['data']]
 
                time_pos = l.index(k)
 
            except ValueError:
 
                time_pos = False
 

	
 
            if time_pos >= 0 and time_pos is not False:
 

	
 
                datadict = commits_by_day_author_aggregate\
 
                    [author_key_cleaner(cs.author)]['data'][time_pos]
 

	
 
                datadict["commits"] += 1
 
                datadict["added"] += len(cs.added)
 
                datadict["changed"] += len(cs.changed)
 
                datadict["removed"] += len(cs.removed)
 

	
 
            else:
 
                if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
 

	
 
                    datadict = {"time":k,
 
                                "commits":1,
 
                                "added":len(cs.added),
 
                                "changed":len(cs.changed),
 
                                "removed":len(cs.removed),
 
                               }
 
                    commits_by_day_author_aggregate\
 
                        [author_key_cleaner(cs.author)]['data'].append(datadict)
 

	
 
        else:
 
            if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
 
                commits_by_day_author_aggregate[author_key_cleaner(cs.author)] = {
 
                                    "label":author_key_cleaner(cs.author),
 
                                    "data":[{"time":k,
 
                                             "commits":1,
 
                                             "added":len(cs.added),
 
                                             "changed":len(cs.changed),
 
                                             "removed":len(cs.removed),
 
                                             }],
 
                                    "schema":["commits"],
 
                                    }
rhodecode/lib/indexers/daemon.py
Show inline comments
 
#!/usr/bin/env python
 
# encoding: utf-8
 
# whoosh indexer daemon for rhodecode
 
# Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
 
#
 
# This program is free software; you can redistribute it and/or
 
# modify it under the terms of the GNU General Public License
 
# as published by the Free Software Foundation; version 2
 
# of the License or (at your opinion) any later version of the license.
 
# 
 
# This program is distributed in the hope that it will be useful,
 
# but WITHOUT ANY WARRANTY; without even the implied warranty of
 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 
# GNU General Public License for more details.
 
# 
 
# You should have received a copy of the GNU General Public License
 
# along with this program; if not, write to the Free Software
 
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
 
# MA  02110-1301, USA.
 
"""
 
Created on Jan 26, 2010
 

	
 
@author: marcink
 
A deamon will read from task table and run tasks
 
"""
 
import sys
 
import os
 
from os.path import dirname as dn
 
from os.path import join as jn
 

	
 
#to get the rhodecode import
 
project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
 
sys.path.append(project_path)
 

	
 

	
 
from rhodecode.model.scm import ScmModel
 
from rhodecode.lib.helpers import safe_unicode
 
from whoosh.index import create_in, open_dir
 
from shutil import rmtree
 
from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
 

	
 
from time import mktime
 
from vcs.exceptions import ChangesetError, RepositoryError
 

	
 
import logging
 

	
 
log = logging.getLogger('whooshIndexer')
 
# create logger
 
log.setLevel(logging.DEBUG)
 
log.propagate = False
 
# create console handler and set level to debug
 
ch = logging.StreamHandler()
 
ch.setLevel(logging.DEBUG)
 

	
 
# create formatter
 
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 

	
 
# add formatter to ch
 
ch.setFormatter(formatter)
 

	
 
# add ch to logger
 
log.addHandler(ch)
 

	
 
class WhooshIndexingDaemon(object):
 
    """
 
    Deamon for atomic jobs
 
    """
 

	
 
    def __init__(self, indexname='HG_INDEX', index_location=None,
 
                 repo_location=None):
 
                 repo_location=None, sa=None):
 
        self.indexname = indexname
 

	
 
        self.index_location = index_location
 
        if not index_location:
 
            raise Exception('You have to provide index location')
 

	
 
        self.repo_location = repo_location
 
        if not repo_location:
 
            raise Exception('You have to provide repositories location')
 

	
 
        self.repo_paths = ScmModel().repo_scan(self.repo_location, None)
 
        self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
 
        self.initial = False
 
        if not os.path.isdir(self.index_location):
 
            os.makedirs(self.index_location)
 
            log.info('Cannot run incremental index since it does not'
 
                     ' yet exist running full build')
 
            self.initial = True
 

	
 
    def get_paths(self, repo):
 
        """recursive walk in root dir and return a set of all path in that dir
 
        based on repository walk function
 
        """
 
        index_paths_ = set()
 
        try:
 
            for topnode, dirs, files in repo.walk('/', 'tip'):
 
                for f in files:
 
                    index_paths_.add(jn(repo.path, f.path))
 
                for dir in dirs:
 
                    for f in files:
 
                        index_paths_.add(jn(repo.path, f.path))
 

	
 
        except RepositoryError:
 
            pass
 
        return index_paths_
 

	
 
    def get_node(self, repo, path):
 
        n_path = path[len(repo.path) + 1:]
 
        node = repo.get_changeset().get_node(n_path)
 
        return node
 

	
 
    def get_node_mtime(self, node):
 
        return mktime(node.last_changeset.date.timetuple())
 

	
 
    def add_doc(self, writer, path, repo):
 
        """Adding doc to writer this function itself fetches data from
 
        the instance of vcs backend"""
 
        node = self.get_node(repo, path)
 

	
 
        #we just index the content of chosen files
 
        if node.extension in INDEX_EXTENSIONS:
 
            log.debug('    >> %s [WITH CONTENT]' % path)
 
            u_content = node.content
 
        else:
 
            log.debug('    >> %s' % path)
 
            #just index file name without it's content
 
            u_content = u''
 

	
 
        writer.add_document(owner=unicode(repo.contact),
 
                        repository=safe_unicode(repo.name),
 
                        path=safe_unicode(path),
 
                        content=u_content,
 
                        modtime=self.get_node_mtime(node),
 
                        extension=node.extension)
 

	
 

	
 
    def build_index(self):
 
        if os.path.exists(self.index_location):
 
            log.debug('removing previous index')
 
            rmtree(self.index_location)
 

	
 
        if not os.path.exists(self.index_location):
 
            os.mkdir(self.index_location)
 

	
 
        idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
 
        writer = idx.writer()
 

	
 
        for cnt, repo in enumerate(self.repo_paths.values()):
 
            log.debug('building index @ %s' % repo.path)
 

	
 
            for idx_path in self.get_paths(repo):
 
                self.add_doc(writer, idx_path, repo)
 

	
 
        log.debug('>> COMMITING CHANGES <<')
 
        writer.commit(merge=True)
 
        log.debug('>>> FINISHED BUILDING INDEX <<<')
 

	
 

	
 
    def update_index(self):
 
        log.debug('STARTING INCREMENTAL INDEXING UPDATE')
 

	
 
        idx = open_dir(self.index_location, indexname=self.indexname)
 
        # The set of all paths in the index
 
        indexed_paths = set()
 
        # The set of all paths we need to re-index
 
        to_index = set()
 

	
 
        reader = idx.reader()
 
        writer = idx.writer()
 

	
 
        # Loop over the stored fields in the index
 
        for fields in reader.all_stored_fields():
 
            indexed_path = fields['path']
 
            indexed_paths.add(indexed_path)
 

	
 
            repo = self.repo_paths[fields['repository']]
 

	
 
            try:
0 comments (0 inline, 0 general)