kallithea Changeset - aac24db58ce8

Changeset - aac24db58ce8

Parent rev.

Child rev.

[Not reviewed]

beta

0 4 0

Marcin Kuzminski - 15 years ago 2010-11-27 01:43:04
marcin@python-works.com

fixed cache problem,
updated docs

4 files changed with 46 insertions and 4 deletions:

docs/changelog.rst

docs/setup.rst

rhodecode/lib/celerylib/tasks.py

rhodecode/lib/indexers/daemon.py

0 comments (0 inline, 0 general)

docs/changelog.rst

➞

Show inline comments

 .. _changelog:
 Changelog
 =========
 .1.0 (**2010-XX-XX**)
 ----------------------
 :status: in-progress
 :branch: beta
 news
 ++++
 - rewrite of internals for vcs >=0.1.10
 - anonymous access, authentication via ldap
 - performance upgrade for cached repos list - each repository has it's own
   cache that's invalidated when needed.
 - main page quick filter for filtering repositories
 - user dashboards with ability to follow chosen repositories actions
 - sends email to admin on new user registration
 - added cache/statistics reset options into repository settings
 - more detailed action logger (based on hooks) with pushed changesets lists
   and options to disable those hooks from admin panel
 - introduced new enhanced changelog for merges that shows more accurate results
 - gui optimizations, fixed application width to 1024px
 - whoosh,celeryd,upgrade moved to paster command
+- whoosh, celeryd, upgrade moved to paster command
 fixes
 +++++
 - fixes #61 forked repo was showing only after cache expired
 - fixes #76 no confirmation on user deletes
 - fixes #66 Name field misspelled
 - fixes #72 block user removal when he owns repositories
 - fixes #69 added password confirmation fields
 - numerous small bugfixes
 - a lot of fixes and tweaks for file browser
 - fixed detached session issues
 (special thanks for TkSoh for detailed feedback)
 .0.2 (**2010-11-12**)
 ----------------------
 news
 ++++
 - tested under python2.7
 - bumped sqlalchemy and celery versions
 fixes
 +++++
 - fixed #59 missing graph.js
 - fixed repo_size crash when repository had broken symlinks
 - fixed python2.5 crashes.
 .0.1 (**2010-11-10**)
 ----------------------
 news
 ++++
 - small css updated
 fixes
 +++++
 - fixed #53 python2.5 incompatible enumerate calls
 - fixed #52 disable mercurial extension for web
 - fixed #51 deleting repositories don't delete it's dependent objects
 .0.0 (**2010-11-02**)
 ----------------------
 - security bugfix simplehg wasn't checking for permissions on commands
   other than pull or push.
 - fixed doubled messages after push or pull in admin journal
 - templating and css corrections, fixed repo switcher on chrome, updated titles
 - admin menu accessible from options menu on repository view
 - permissions cached queries
 .0.0rc4  (**2010-10-12**)
 --------------------------
 - fixed python2.5 missing simplejson imports (thanks to Jens Bäckman)
 - removed cache_manager settings from sqlalchemy meta
 - added sqlalchemy cache settings to ini files
 - validated password length and added second try of failure on paster setup-app
 - fixed setup database destroy prompt even when there was no db
 .0.0rc3 (**2010-10-11**)
 -------------------------
 - fixed i18n during installation.
 .0.0rc2 (**2010-10-11**)
 -------------------------
 - Disabled dirsize in file browser, it's causing nasty bug when dir renames
   occure. After vcs is fixed it'll be put back again.
 - templating/css rewrites, optimized css.

docs/setup.rst

➞

Show inline comments

@@ @@ -38,192 +38,208 @@ Setting up the application @@
   file  created in previous step
 - Use admin account you created to login.
 - Default permissions on each repository is read, and owner is admin. So
   remember to update these if needed.
 Setting up Whoosh full text search
 ----------------------------------
 Index for whoosh can be build starting from version 1.1 using paster command
 passing repo locations to index, as well as Your config file that stores
 whoosh index files locations. There is possible to pass `-f` to the options
 to enable full index rebuild. Without that indexing will run always in in
 incremental mode.
 ::
  paster make-index --repo-location=<location for repos> production.ini
 for full index rebuild You can use
 ::
  paster make-index -f --repo-location=<location for repos> production.ini
 - For full text search You can either put crontab entry for
 This command can be run even from crontab in order to do periodical
 index builds and keep Your index always up to date. An example entry might
 look like this
 ::
  /path/to/python/bin/paster --repo-location=<location for repos> /path/to/rhodecode/production.ini
 When using incremental(default) mode whoosh will check last modification date
 of each file and add it to reindex if newer file is available. Also indexing
 daemon checks for removed files and removes them from index.
 Sometime You might want to rebuild index from scratch. You can do that using
 the `-f` flag passed to paster command or, in admin panel You can check
 `build from scratch` flag.
 Setting up LDAP support
 -----------------------
 RhodeCode starting from version 1.1 supports ldap authentication. In order
 to use ldap, You have to install python-ldap package. This package is available
 via pypi, so You can install it by running
 ::
  easy_install python-ldap
 ::
  pip install python-ldap
 .. note::
    python-ldap requires some certain libs on Your system, so before installing
    it check that You have at least `openldap`, and `sasl` libraries.
 ldap settings are located in admin->ldap section,
 Here's a typical ldap setup::
  Enable ldap  = checked                 #controls if ldap access is enabled
  Host         = host.domain.org         #actual ldap server to connect
  Port         = 389 or 689 for ldaps    #ldap server ports
  Enable LDAPS = unchecked               #enable disable ldaps
  Account      = <account>               #access for ldap server(if required)
  Password     = <password>              #password for ldap server(if required)
  Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
 `Account` and `Password` are optional, and used for two-phase ldap
 authentication so those are credentials to access Your ldap, if it doesn't
 support anonymous search/user lookups.
 Base DN must have %(user)s template inside, it's a placer where Your uid used
 to login would go, it allows admins to specify not standard schema for uid
 variable
 If all data are entered correctly, and `python-ldap` is properly installed
 Users should be granted to access RhodeCode wit ldap accounts. When
 logging at the first time an special ldap account is created inside RhodeCode,
 so You can control over permissions even on ldap users. If such user exists
 already in RhodeCode database ldap user with the same username would be not
 able to access RhodeCode.
 If You have problems with ldap access and believe You entered correct
 information check out the RhodeCode logs,any error messages sent from
 ldap will be saved there.
 Setting Up Celery
 -----------------
 Since version 1.1 celery is configured by the rhodecode ini configuration files
 simply set use_celery=true in the ini file then add / change the configuration
 variables inside the ini file.
 Remember that the ini files uses format with '.' not with '_' like celery
 so for example setting `BROKER_HOST` in celery means setting `broker.host` in
 the config file.
 In order to make start using celery run::
  paster celeryd <configfile.ini>
 Nginx virtual host example
 --------------------------
 Sample config for nginx using proxy::
  server {
     listen          80;
     server_name     hg.myserver.com;
     access_log      /var/log/nginx/rhodecode.access.log;
     error_log       /var/log/nginx/rhodecode.error.log;
     location / {
             root /var/www/rhodecode/rhodecode/public/;
             if (!-f $request_filename){
                 proxy_pass      http://127.0.0.1:5000;
+            }
             #this is important for https !!!
             proxy_set_header X-Url-Scheme $scheme;
             include         /etc/nginx/proxy.conf;
+    }
+ }
 Here's the proxy.conf. It's tuned so it'll not timeout on long
 pushes and also on large pushes::
     proxy_redirect              off;
     proxy_set_header            Host $host;
     proxy_set_header            X-Host $http_host;
     proxy_set_header            X-Real-IP $remote_addr;
     proxy_set_header            X-Forwarded-For $proxy_add_x_forwarded_for;
     proxy_set_header            Proxy-host $proxy_host;
     client_max_body_size        400m;
     client_body_buffer_size     128k;
     proxy_buffering             off;
     proxy_connect_timeout       3600;
     proxy_send_timeout          3600;
     proxy_read_timeout          3600;
     proxy_buffer_size           8k;
     proxy_buffers               8 32k;
     proxy_busy_buffers_size     64k;
     proxy_temp_file_write_size  64k;
 Also when using root path with nginx You might set the static files to false
 in production.ini file::
   [app:main]
     use = egg:rhodecode
     full_stack = true
     static_files = false
     lang=en
     cache_dir = %(here)s/data
 To not have the statics served by the application. And improve speed.
 Apache reverse proxy
 --------------------
 Tutorial can be found here
 http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
 Apache's example FCGI config
 ----------------------------
 TODO !
 Other configuration files
 -------------------------
 Some extra configuration files and examples can be found here:
 http://hg.python-works.com/rhodecode/files/tip/init.d
 and also an celeryconfig file can be use from here:
 http://hg.python-works.com/rhodecode/files/tip/celeryconfig.py
 Troubleshooting
 ---------------
 - missing static files ?
  - make sure either to set the `static_files = true` in the .ini file or
    double check the root path for Your http setup. It should point to
    for example:
    /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
 - can't install celery/rabbitmq
  - don't worry RhodeCode works without them too. No extra setup required
 - long lasting push timeouts ?
  - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
    are caused by https server and not RhodeCode
 - large pushes timeouts ?
  - make sure You set a proper max_body_size for the http server

rhodecode/lib/celerylib/tasks.py

➞

Show inline comments

 from celery.decorators import task
 import os
 import traceback
 import beaker
 from time import mktime
 from operator import itemgetter
 from pylons import config
 from pylons.i18n.translation import _
 from rhodecode.lib.celerylib import run_task, locked_task, str2bool
 from rhodecode.lib.helpers import person
 from rhodecode.lib.smtp_mailer import SmtpMailer
 from rhodecode.lib.utils import OrderedDict
 from rhodecode.model import init_model
 from rhodecode.model import meta
 from rhodecode.model.db import RhodeCodeUi
 from vcs.backends import get_repo
 from sqlalchemy import engine_from_config
 #set cache regions for beaker so celery can utilise it
 def add_cache(settings):
     cache_settings = {'regions':None}
     for key in settings.keys():
         for prefix in ['beaker.cache.', 'cache.']:
             if key.startswith(prefix):
                 name = key.split(prefix)[1].strip()
                 cache_settings[name] = settings[key].strip()
     if cache_settings['regions']:
         for region in cache_settings['regions'].split(','):
             region = region.strip()
             region_settings = {}
             for key, value in cache_settings.items():
                 if key.startswith(region):
                     region_settings[key.split('.')[1]] = value
             region_settings['expire'] = int(region_settings.get('expire',
 ))
             region_settings.setdefault('lock_dir',
                                        cache_settings.get('lock_dir'))
             if 'type' not in region_settings:
                 region_settings['type'] = cache_settings.get('type',
                                                              'memory')
             beaker.cache.cache_regions[region] = region_settings
 add_cache(config)
 try:
     import json
 except ImportError:
     #python 2.5 compatibility
     import simplejson as json
 __all__ = ['whoosh_index', 'get_commits_stats',
            'reset_user_password', 'send_email']
 CELERY_ON = str2bool(config['app_conf'].get('use_celery'))
 def get_session():
     if CELERY_ON:
         engine = engine_from_config(config, 'sqlalchemy.db1.')
         init_model(engine)
     sa = meta.Session()
     return sa
 def get_repos_path():
     sa = get_session()
     q = sa.query(RhodeCodeUi).filter(RhodeCodeUi.ui_key == '/').one()
     return q.ui_value
 @task
 @locked_task
 def whoosh_index(repo_location, full_index):
     log = whoosh_index.get_logger()
     from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
     index_location = config['index_dir']
     WhooshIndexingDaemon(index_location=index_location,
                          repo_location=repo_location).run(full_index=full_index)
                          repo_location=repo_location, sa=get_session())\
                          .run(full_index=full_index)
 @task
 @locked_task
 def get_commits_stats(repo_name, ts_min_y, ts_max_y):
     from rhodecode.model.db import Statistics, Repository
     log = get_commits_stats.get_logger()
     #for js data compatibilty
     author_key_cleaner = lambda k: person(k).replace('"', "")
     commits_by_day_author_aggregate = {}
     commits_by_day_aggregate = {}
     repos_path = get_repos_path()
     p = os.path.join(repos_path, repo_name)
     repo = get_repo(p)
     skip_date_limit = True
     parse_limit = 250 #limit for single task changeset parsing optimal for
     last_rev = 0
     last_cs = None
     timegetter = itemgetter('time')
     sa = get_session()
     dbrepo = sa.query(Repository)\
         .filter(Repository.repo_name == repo_name).scalar()
     cur_stats = sa.query(Statistics)\
         .filter(Statistics.repository == dbrepo).scalar()
     if cur_stats:
         last_rev = cur_stats.stat_on_revision
     if not repo.revisions:
         return True
     if last_rev == repo.revisions[-1] and len(repo.revisions) > 1:
         #pass silently without any work if we're not on first revision or
         #current state of parsing revision(from db marker) is the last revision
         return True
     if cur_stats:
         commits_by_day_aggregate = OrderedDict(
                                        json.loads(
                                         cur_stats.commit_activity_combined))
         commits_by_day_author_aggregate = json.loads(cur_stats.commit_activity)
     log.debug('starting parsing %s', parse_limit)
     lmktime = mktime
     for cnt, rev in enumerate(repo.revisions[last_rev:]):
         last_cs = cs = repo.get_changeset(rev)
         k = '%s-%s-%s' % (cs.date.timetuple()[0], cs.date.timetuple()[1],
                           cs.date.timetuple()[2])
         timetupple = [int(x) for x in k.split('-')]
         timetupple.extend([0 for _ in xrange(6)])
         k = lmktime(timetupple)
         if commits_by_day_author_aggregate.has_key(author_key_cleaner(cs.author)):
             try:
                 l = [timegetter(x) for x in commits_by_day_author_aggregate\
                         [author_key_cleaner(cs.author)]['data']]
                 time_pos = l.index(k)
             except ValueError:
                 time_pos = False
             if time_pos >= 0 and time_pos is not False:
                 datadict = commits_by_day_author_aggregate\
                     [author_key_cleaner(cs.author)]['data'][time_pos]
                 datadict["commits"] += 1
                 datadict["added"] += len(cs.added)
                 datadict["changed"] += len(cs.changed)
                 datadict["removed"] += len(cs.removed)
             else:
                 if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
                     datadict = {"time":k,
                                 "commits":1,
                                 "added":len(cs.added),
                                 "changed":len(cs.changed),
                                 "removed":len(cs.removed),
+                               }
                     commits_by_day_author_aggregate\
                         [author_key_cleaner(cs.author)]['data'].append(datadict)
         else:
             if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
                 commits_by_day_author_aggregate[author_key_cleaner(cs.author)] = {
                                     "label":author_key_cleaner(cs.author),
                                     "data":[{"time":k,
                                              "commits":1,
                                              "added":len(cs.added),
                                              "changed":len(cs.changed),
                                              "removed":len(cs.removed),
                                              }],
                                     "schema":["commits"],
+                                    }

rhodecode/lib/indexers/daemon.py

➞

Show inline comments

 #!/usr/bin/env python
 # encoding: utf-8
 # whoosh indexer daemon for rhodecode
 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
+#
 # This program is free software; you can redistribute it and/or
 # modify it under the terms of the GNU General Public License
 # as published by the Free Software Foundation; version 2
 # of the License or (at your opinion) any later version of the license.
+#
 # This program is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
+#
 # You should have received a copy of the GNU General Public License
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
 # MA  02110-1301, USA.
 """
 Created on Jan 26, 2010
 @author: marcink
 A deamon will read from task table and run tasks
 """
 import sys
 import os
 from os.path import dirname as dn
 from os.path import join as jn
 #to get the rhodecode import
 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
 sys.path.append(project_path)
 from rhodecode.model.scm import ScmModel
 from rhodecode.lib.helpers import safe_unicode
 from whoosh.index import create_in, open_dir
 from shutil import rmtree
 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
 from time import mktime
 from vcs.exceptions import ChangesetError, RepositoryError
 import logging
 log = logging.getLogger('whooshIndexer')
 # create logger
 log.setLevel(logging.DEBUG)
 log.propagate = False
 # create console handler and set level to debug
 ch = logging.StreamHandler()
 ch.setLevel(logging.DEBUG)
 # create formatter
 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 # add formatter to ch
 ch.setFormatter(formatter)
 # add ch to logger
 log.addHandler(ch)
 class WhooshIndexingDaemon(object):
     """
     Deamon for atomic jobs
     """
     def __init__(self, indexname='HG_INDEX', index_location=None,
                  repo_location=None):
+                 repo_location=None, sa=None):
         self.indexname = indexname
         self.index_location = index_location
         if not index_location:
             raise Exception('You have to provide index location')
         self.repo_location = repo_location
         if not repo_location:
             raise Exception('You have to provide repositories location')
         self.repo_paths = ScmModel().repo_scan(self.repo_location, None)
+        self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
         self.initial = False
         if not os.path.isdir(self.index_location):
             os.makedirs(self.index_location)
             log.info('Cannot run incremental index since it does not'
                      ' yet exist running full build')
             self.initial = True
     def get_paths(self, repo):
         """recursive walk in root dir and return a set of all path in that dir
         based on repository walk function
         """
         index_paths_ = set()
         try:
             for topnode, dirs, files in repo.walk('/', 'tip'):
                 for f in files:
                     index_paths_.add(jn(repo.path, f.path))
                 for dir in dirs:
                     for f in files:
                         index_paths_.add(jn(repo.path, f.path))
         except RepositoryError:
             pass
         return index_paths_
     def get_node(self, repo, path):
         n_path = path[len(repo.path) + 1:]
         node = repo.get_changeset().get_node(n_path)
         return node
     def get_node_mtime(self, node):
         return mktime(node.last_changeset.date.timetuple())
     def add_doc(self, writer, path, repo):
         """Adding doc to writer this function itself fetches data from
         the instance of vcs backend"""
         node = self.get_node(repo, path)
         #we just index the content of chosen files
         if node.extension in INDEX_EXTENSIONS:
             log.debug('    >> %s [WITH CONTENT]' % path)
             u_content = node.content
         else:
             log.debug('    >> %s' % path)
             #just index file name without it's content
             u_content = u''
         writer.add_document(owner=unicode(repo.contact),
                         repository=safe_unicode(repo.name),
                         path=safe_unicode(path),
                         content=u_content,
                         modtime=self.get_node_mtime(node),
                         extension=node.extension)
     def build_index(self):
         if os.path.exists(self.index_location):
             log.debug('removing previous index')
             rmtree(self.index_location)
         if not os.path.exists(self.index_location):
             os.mkdir(self.index_location)
         idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
         writer = idx.writer()
         for cnt, repo in enumerate(self.repo_paths.values()):
             log.debug('building index @ %s' % repo.path)
             for idx_path in self.get_paths(repo):
                 self.add_doc(writer, idx_path, repo)
         log.debug('>> COMMITING CHANGES <<')
         writer.commit(merge=True)
         log.debug('>>> FINISHED BUILDING INDEX <<<')
     def update_index(self):
         log.debug('STARTING INCREMENTAL INDEXING UPDATE')
         idx = open_dir(self.index_location, indexname=self.indexname)
         # The set of all paths in the index
         indexed_paths = set()
         # The set of all paths we need to re-index
         to_index = set()
         reader = idx.reader()
         writer = idx.writer()
         # Loop over the stored fields in the index
         for fields in reader.all_stored_fields():
             indexed_path = fields['path']
             indexed_paths.add(indexed_path)
             repo = self.repo_paths[fields['repository']]
             try:

0 comments (0 inline, 0 general)