kallithea Changeset - 94f7585af8a1

Changeset - 94f7585af8a1

Parent rev.

Child rev.

[Not reviewed]

beta

0 2 0

Marcin Kuzminski - 15 years ago 2010-12-28 18:44:32
marcin@python-works.com

fixes to #92, updated changelog

2 files changed with 28 insertions and 13 deletions:

docs/changelog.rst

rhodecode/lib/indexers/daemon.py

0 comments (0 inline, 0 general)

docs/changelog.rst

➞

Show inline comments

 .. _changelog:
 Changelog
 =========
 .2.0 (**2010-12-18**)
 ----------------------
 :status: in-progress
 :branch: beta
 news
 ++++
 - implemented #91 added nicer looking archive urls
 - implemented #44 into file browsing, and added follow branch option
 fixes
 ++++
 - fixed file browser bug, when switching into given form revision the url was
   not changing
 - fixed #92
 .1.0 (**2010-12-18**)
 ----------------------
 news
 ++++
 - rewrite of internals for vcs >=0.1.10
 - uses mercurial 1.7 with dotencode disabled for maintaining compatibility
   with older clients
 - anonymous access, authentication via ldap
 - performance upgrade for cached repos list - each repository has it's own
   cache that's invalidated when needed.
 - performance upgrades on repositories with large amount of commits (20K+)
 - main page quick filter for filtering repositories
 - user dashboards with ability to follow chosen repositories actions
 - sends email to admin on new user registration
 - added cache/statistics reset options into repository settings
 - more detailed action logger (based on hooks) with pushed changesets lists
   and options to disable those hooks from admin panel
 - introduced new enhanced changelog for merges that shows more accurate results
 - new improved and faster code stats (based on pygments lexers mapping tables,
   showing up to 10 trending sources for each repository. Additionally stats
   can be disabled in repository settings.
 - gui optimizations, fixed application width to 1024px
 - added cut off (for large files/changesets) limit into config files
 - whoosh, celeryd, upgrade moved to paster command
 - other than sqlite database backends can be used
 fixes
 +++++
 - fixes #61 forked repo was showing only after cache expired
 - fixes #76 no confirmation on user deletes
 - fixes #66 Name field misspelled
 - fixes #72 block user removal when he owns repositories
 - fixes #69 added password confirmation fields
 - fixes #87 RhodeCode crashes occasionally on updating repository owner
 - fixes #82 broken annotations on files with more than 1 blank line at the end
 - a lot of fixes and tweaks for file browser
 - fixed detached session issues
 - fixed when user had no repos he would see all repos listed in my account
 - fixed ui() instance bug when global hgrc settings was loaded for server
   instance and all hgrc options were merged with our db ui() object
 - numerous small bugfixes
 (special thanks for TkSoh for detailed feedback)
 .0.2 (**2010-11-12**)
 ----------------------
 news
 ++++
 - tested under python2.7
 - bumped sqlalchemy and celery versions
 fixes
 +++++
 - fixed #59 missing graph.js
 - fixed repo_size crash when repository had broken symlinks
 - fixed python2.5 crashes.
 .0.1 (**2010-11-10**)
 ----------------------
 news
 ++++
 - small css updated
 fixes
 +++++
 - fixed #53 python2.5 incompatible enumerate calls
 - fixed #52 disable mercurial extension for web
 - fixed #51 deleting repositories don't delete it's dependent objects
 .0.0 (**2010-11-02**)
 ----------------------
 - security bugfix simplehg wasn't checking for permissions on commands
   other than pull or push.
 - fixed doubled messages after push or pull in admin journal
 - templating and css corrections, fixed repo switcher on chrome, updated titles
 - admin menu accessible from options menu on repository view
 - permissions cached queries
 .0.0rc4  (**2010-10-12**)
 --------------------------
 - fixed python2.5 missing simplejson imports (thanks to Jens Bäckman)

rhodecode/lib/indexers/daemon.py

➞

Show inline comments

 #!/usr/bin/env python
 # encoding: utf-8
 # whoosh indexer daemon for rhodecode
 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
+#
 # -*- coding: utf-8 -*-
 """
     rhodecode.lib.indexers.daemon
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     A deamon will read from task table and run tasks
     :created_on: Jan 26, 2010
     :author: marcink
     :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
     :license: GPLv3, see COPYING for more details.
 """
 # This program is free software; you can redistribute it and/or
 # modify it under the terms of the GNU General Public License
 # as published by the Free Software Foundation; version 2
 # of the License or (at your opinion) any later version of the license.
+#
 # This program is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
+#
 # You should have received a copy of the GNU General Public License
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
 # MA  02110-1301, USA.
 """
 Created on Jan 26, 2010
 @author: marcink
 A deamon will read from task table and run tasks
 """
 import sys
 import os
 import traceback
 from os.path import dirname as dn
 from os.path import join as jn
 #to get the rhodecode import
 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
 sys.path.append(project_path)
 from rhodecode.model.scm import ScmModel
 from rhodecode.lib.helpers import safe_unicode
 from whoosh.index import create_in, open_dir
 from shutil import rmtree
 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
 from time import mktime
 from vcs.exceptions import ChangesetError, RepositoryError
 import logging
 log = logging.getLogger('whooshIndexer')
 # create logger
 log.setLevel(logging.DEBUG)
 log.propagate = False
 # create console handler and set level to debug
 ch = logging.StreamHandler()
 ch.setLevel(logging.DEBUG)
 # create formatter
 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 # add formatter to ch
 ch.setFormatter(formatter)
 # add ch to logger
 log.addHandler(ch)
 class WhooshIndexingDaemon(object):
     """
     Deamon for atomic jobs
     """
     def __init__(self, indexname='HG_INDEX', index_location=None,
                  repo_location=None, sa=None):
         self.indexname = indexname
         self.index_location = index_location
         if not index_location:
             raise Exception('You have to provide index location')
         self.repo_location = repo_location
         if not repo_location:
             raise Exception('You have to provide repositories location')
         self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
         self.initial = False
         if not os.path.isdir(self.index_location):
             os.makedirs(self.index_location)
             log.info('Cannot run incremental index since it does not'
                      ' yet exist running full build')
             self.initial = True
     def get_paths(self, repo):
         """recursive walk in root dir and return a set of all path in that dir
         based on repository walk function
         """
         index_paths_ = set()
         try:
             for topnode, dirs, files in repo.walk('/', 'tip'):
                 for f in files:
                     index_paths_.add(jn(repo.path, f.path))
                 for dir in dirs:
                     for f in files:
                         index_paths_.add(jn(repo.path, f.path))
         except RepositoryError:
         except RepositoryError, e:
             log.debug(traceback.format_exc())
             pass
         return index_paths_
     def get_node(self, repo, path):
         n_path = path[len(repo.path) + 1:]
         node = repo.get_changeset().get_node(n_path)
         return node
     def get_node_mtime(self, node):
         return mktime(node.last_changeset.date.timetuple())
     def add_doc(self, writer, path, repo):
         """Adding doc to writer this function itself fetches data from
         the instance of vcs backend"""
         node = self.get_node(repo, path)
         #we just index the content of chosen files
         if node.extension in INDEX_EXTENSIONS:
             u_content = node.content
             if not isinstance(u_content, unicode):
                 log.warning('  >> %s Could not get this content as unicode '
                           'replacing with empty content', path)
                 u_content = u''
             else:
             log.debug('    >> %s [WITH CONTENT]' % path)
-            u_content = node.content
         else:
             log.debug('    >> %s' % path)
             #just index file name without it's content
             u_content = u''
         writer.add_document(owner=unicode(repo.contact),
                         repository=safe_unicode(repo.name),
                         path=safe_unicode(path),
                         content=u_content,
                         modtime=self.get_node_mtime(node),
                         extension=node.extension)
     def build_index(self):
         if os.path.exists(self.index_location):
             log.debug('removing previous index')
             rmtree(self.index_location)
         if not os.path.exists(self.index_location):
             os.mkdir(self.index_location)
         idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
         writer = idx.writer()
+        print self.repo_paths.values()
         for cnt, repo in enumerate(self.repo_paths.values()):
             log.debug('building index @ %s' % repo.path)
             for idx_path in self.get_paths(repo):
                 self.add_doc(writer, idx_path, repo)
         log.debug('>> COMMITING CHANGES <<')
         writer.commit(merge=True)
         log.debug('>>> FINISHED BUILDING INDEX <<<')
     def update_index(self):
         log.debug('STARTING INCREMENTAL INDEXING UPDATE')
         idx = open_dir(self.index_location, indexname=self.indexname)
         # The set of all paths in the index
         indexed_paths = set()
         # The set of all paths we need to re-index
         to_index = set()
         reader = idx.reader()
         writer = idx.writer()
         # Loop over the stored fields in the index
         for fields in reader.all_stored_fields():
             indexed_path = fields['path']
             indexed_paths.add(indexed_path)
             repo = self.repo_paths[fields['repository']]
             try:
                 node = self.get_node(repo, indexed_path)
             except ChangesetError:
                 # This file was deleted since it was indexed
                 log.debug('removing from index %s' % indexed_path)
                 writer.delete_by_term('path', indexed_path)
             else:
                 # Check if this file was changed since it was indexed
                 indexed_time = fields['modtime']
                 mtime = self.get_node_mtime(node)
                 if mtime > indexed_time:
                     # The file has changed, delete it and add it to the list of
                     # files to reindex
                     log.debug('adding to reindex list %s' % indexed_path)
                     writer.delete_by_term('path', indexed_path)
                     to_index.add(indexed_path)
         # Loop over the files in the filesystem
         # Assume we have a function that gathers the filenames of the
         # documents to be indexed
         for repo in self.repo_paths.values():
             for path in self.get_paths(repo):
                 if path in to_index or path not in indexed_paths:
                     # This is either a file that's changed, or a new file
                     # that wasn't indexed before. So index it!
                     self.add_doc(writer, path, repo)
                     log.debug('re indexing %s' % path)
         log.debug('>> COMMITING CHANGES <<')
         writer.commit(merge=True)
         log.debug('>>> FINISHED REBUILDING INDEX <<<')
     def run(self, full_index=False):
         """Run daemon"""
         if full_index or self.initial:
             self.build_index()
         else:
             self.update_index()

0 comments (0 inline, 0 general)