Files
@ 33b71a130b16
Branch filter:
Location: kallithea/kallithea/controllers/summary.py
33b71a130b16
8.3 KiB
text/x-python
templates: properly escape inline JavaScript values
TLDR: Kallithea has issues with escaping values for use in inline JS.
Despite judicious poking of the code, no actual security vulnerabilities
have been found, just lots of corner-case bugs. This patch fixes those,
and hardens the code against actual security issues.
The long version:
To embed a Python value (typically a 'unicode' plain-text value) in a
larger file, it must be escaped in a context specific manner. Example:
>>> s = u'<script>alert("It\'s a trap!");</script>'
1) Escaped for insertion into HTML element context
>>> print cgi.escape(s)
<script>alert("It's a trap!");</script>
2) Escaped for insertion into HTML element or attribute context
>>> print h.escape(s)
<script>alert("It's a trap!");</script>
This is the default Mako escaping, as usually used by Kallithea.
3) Encoded as JSON
>>> print json.dumps(s)
"<script>alert(\"It's a trap!\");</script>"
4) Escaped for insertion into a JavaScript file
>>> print '(' + json.dumps(s) + ')'
("<script>alert(\"It's a trap!\");</script>")
The parentheses are not actually required for strings, but may be needed
to avoid syntax errors if the value is a number or dict (object).
5) Escaped for insertion into a HTML inline <script> element
>>> print h.js(s)
("\x3cscript\x3ealert(\"It's a trap!\");\x3c/script\x3e")
Here, we need to combine JS and HTML escaping, further complicated by
the fact that "<script>" tag contents can either be parsed in XHTML mode
(in which case '<', '>' and '&' must additionally be XML escaped) or
HTML mode (in which case '</script>' must be escaped, but not using HTML
escaping, which is not available in HTML "<script>" tags). Therefore,
the XML special characters (which can only occur in string literals) are
escaped using JavaScript string literal escape sequences.
(This, incidentally, is why modern web security best practices ban all
use of inline JavaScript...)
Unsurprisingly, Kallithea does not do (5) correctly. In most cases,
Kallithea might slap a pair of single quotes around the HTML escaped
Python value. A typical benign example:
$('#child_link').html('${_('No revisions')}');
This works in English, but if a localized version of the string contains
an apostrophe, the result will be broken JavaScript. In the more severe
cases, where the text is user controllable, it leaves the door open to
injections. In this example, the script inserts the string as HTML, so
Mako's implicit HTML escaping makes sense; but in many other cases, HTML
escaping is actually an error, because the value is not used by the
script in an HTML context.
The good news is that the HTML escaping thwarts attempts at XSS, since
it's impossible to inject syntactically valid JavaScript of any useful
complexity. It does allow JavaScript errors and gibberish to appear on
the page, though.
In these cases, the escaping has been fixed to use either the new 'h.js'
helper, which does JavaScript escaping (but not HTML escaping), OR the
new 'h.jshtml' helper (which does both), in those cases where it was
unclear if the value might be used (by the script) in an HTML context.
Some of these can probably be "relaxed" from h.jshtml to h.js later, but
for now, using h.jshtml fixes escaping and doesn't introduce new errors.
In a few places, Kallithea JSON encodes values in the controller, then
inserts the JSON (without any further escaping) into <script> tags. This
is also wrong, and carries actual risk of XSS vulnerabilities. However,
in all cases, security vulnerabilities were narrowly avoided due to other
filtering in Kallithea. (E.g. many special characters are banned from
appearing in usernames.) In these cases, the escaping has been fixed
and moved to the template, making it immediately visible that proper
escaping has been performed.
Mini-FAQ (frequently anticipated questions):
Q: Why do everything in one big, hard to review patch?
Q: Why add escaping in specific case FOO, it doesn't seem needed?
Because the goal here is to have "escape everywhere" as the default
policy, rather than identifying individual bugs and fixing them one
by one by adding escaping where needed. As such, this patch surely
introduces a lot of needless escaping. This is no different from
how Mako/Pylons HTML escape everything by default, even when not
needed: it's errs on the side of needless work, to prevent erring
on the side of skipping required (and security critical) work.
As for reviewability, the most important thing to notice is not where
escaping has been introduced, but any places where it might have been
missed (or where h.jshtml is needed, but h.js is used).
Q: The added escaping is kinda verbose/ugly.
That is not a question, but yes, I agree. Hopefully it'll encourage us
to move away from inline JavaScript altogether. That's a significantly
larger job, though; with luck this patch will keep us safe and secure
until such a time as we can implement the real fix.
Q: Why not use Mako filter syntax ("${val|h.js}")?
Because of long-standing Mako bug #140, preventing use of 'h' in
filters.
Q: Why not work around bug #140, or even use straight "${val|js}"?
Because Mako still applies the default h.escape filter before the
explicitly specified filters.
Q: Where do we go from here?
Longer term, we should stop doing variable expansions in script blocks,
and instead pass data to JS via e.g. data attributes, or asynchronously
using AJAX calls. Once we've done that, we can remove inline JavaScript
altogether in favor of separate script files, and set a strict Content
Security Policy explicitly blocking inline scripting, and thus also the
most common kind of cross-site scripting attack.
TLDR: Kallithea has issues with escaping values for use in inline JS.
Despite judicious poking of the code, no actual security vulnerabilities
have been found, just lots of corner-case bugs. This patch fixes those,
and hardens the code against actual security issues.
The long version:
To embed a Python value (typically a 'unicode' plain-text value) in a
larger file, it must be escaped in a context specific manner. Example:
>>> s = u'<script>alert("It\'s a trap!");</script>'
1) Escaped for insertion into HTML element context
>>> print cgi.escape(s)
<script>alert("It's a trap!");</script>
2) Escaped for insertion into HTML element or attribute context
>>> print h.escape(s)
<script>alert("It's a trap!");</script>
This is the default Mako escaping, as usually used by Kallithea.
3) Encoded as JSON
>>> print json.dumps(s)
"<script>alert(\"It's a trap!\");</script>"
4) Escaped for insertion into a JavaScript file
>>> print '(' + json.dumps(s) + ')'
("<script>alert(\"It's a trap!\");</script>")
The parentheses are not actually required for strings, but may be needed
to avoid syntax errors if the value is a number or dict (object).
5) Escaped for insertion into a HTML inline <script> element
>>> print h.js(s)
("\x3cscript\x3ealert(\"It's a trap!\");\x3c/script\x3e")
Here, we need to combine JS and HTML escaping, further complicated by
the fact that "<script>" tag contents can either be parsed in XHTML mode
(in which case '<', '>' and '&' must additionally be XML escaped) or
HTML mode (in which case '</script>' must be escaped, but not using HTML
escaping, which is not available in HTML "<script>" tags). Therefore,
the XML special characters (which can only occur in string literals) are
escaped using JavaScript string literal escape sequences.
(This, incidentally, is why modern web security best practices ban all
use of inline JavaScript...)
Unsurprisingly, Kallithea does not do (5) correctly. In most cases,
Kallithea might slap a pair of single quotes around the HTML escaped
Python value. A typical benign example:
$('#child_link').html('${_('No revisions')}');
This works in English, but if a localized version of the string contains
an apostrophe, the result will be broken JavaScript. In the more severe
cases, where the text is user controllable, it leaves the door open to
injections. In this example, the script inserts the string as HTML, so
Mako's implicit HTML escaping makes sense; but in many other cases, HTML
escaping is actually an error, because the value is not used by the
script in an HTML context.
The good news is that the HTML escaping thwarts attempts at XSS, since
it's impossible to inject syntactically valid JavaScript of any useful
complexity. It does allow JavaScript errors and gibberish to appear on
the page, though.
In these cases, the escaping has been fixed to use either the new 'h.js'
helper, which does JavaScript escaping (but not HTML escaping), OR the
new 'h.jshtml' helper (which does both), in those cases where it was
unclear if the value might be used (by the script) in an HTML context.
Some of these can probably be "relaxed" from h.jshtml to h.js later, but
for now, using h.jshtml fixes escaping and doesn't introduce new errors.
In a few places, Kallithea JSON encodes values in the controller, then
inserts the JSON (without any further escaping) into <script> tags. This
is also wrong, and carries actual risk of XSS vulnerabilities. However,
in all cases, security vulnerabilities were narrowly avoided due to other
filtering in Kallithea. (E.g. many special characters are banned from
appearing in usernames.) In these cases, the escaping has been fixed
and moved to the template, making it immediately visible that proper
escaping has been performed.
Mini-FAQ (frequently anticipated questions):
Q: Why do everything in one big, hard to review patch?
Q: Why add escaping in specific case FOO, it doesn't seem needed?
Because the goal here is to have "escape everywhere" as the default
policy, rather than identifying individual bugs and fixing them one
by one by adding escaping where needed. As such, this patch surely
introduces a lot of needless escaping. This is no different from
how Mako/Pylons HTML escape everything by default, even when not
needed: it's errs on the side of needless work, to prevent erring
on the side of skipping required (and security critical) work.
As for reviewability, the most important thing to notice is not where
escaping has been introduced, but any places where it might have been
missed (or where h.jshtml is needed, but h.js is used).
Q: The added escaping is kinda verbose/ugly.
That is not a question, but yes, I agree. Hopefully it'll encourage us
to move away from inline JavaScript altogether. That's a significantly
larger job, though; with luck this patch will keep us safe and secure
until such a time as we can implement the real fix.
Q: Why not use Mako filter syntax ("${val|h.js}")?
Because of long-standing Mako bug #140, preventing use of 'h' in
filters.
Q: Why not work around bug #140, or even use straight "${val|js}"?
Because Mako still applies the default h.escape filter before the
explicitly specified filters.
Q: Where do we go from here?
Longer term, we should stop doing variable expansions in script blocks,
and instead pass data to JS via e.g. data attributes, or asynchronously
using AJAX calls. Once we've done that, we can remove inline JavaScript
altogether in favor of separate script files, and set a strict Content
Security Policy explicitly blocking inline scripting, and thus also the
most common kind of cross-site scripting attack.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | # -*- coding: utf-8 -*-
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
kallithea.controllers.summary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Summary controller for Kallithea
This file was forked by the Kallithea project in July 2014.
Original author and date, and relevant copyright and licensing information is below:
:created_on: Apr 18, 2010
:author: marcink
:copyright: (c) 2013 RhodeCode GmbH, and others.
:license: GPLv3, see LICENSE.md for more details.
"""
import traceback
import calendar
import logging
import itertools
from time import mktime
from datetime import timedelta, date
from pylons import tmpl_context as c, request
from pylons.i18n.translation import _
from webob.exc import HTTPBadRequest
from beaker.cache import cache_region, region_invalidate
from kallithea.lib.vcs.exceptions import ChangesetError, EmptyRepositoryError, \
NodeDoesNotExistError
from kallithea.config.conf import ALL_READMES, ALL_EXTS, LANGUAGES_EXTENSIONS_MAP
from kallithea.model.db import Statistics, CacheInvalidation, User
from kallithea.lib.utils2 import safe_str
from kallithea.lib.auth import LoginRequired, HasRepoPermissionLevelDecorator, \
NotAnonymous
from kallithea.lib.base import BaseRepoController, render, jsonify
from kallithea.lib.vcs.backends.base import EmptyChangeset
from kallithea.lib.markup_renderer import MarkupRenderer
from kallithea.lib.celerylib.tasks import get_commits_stats
from kallithea.lib.compat import json
from kallithea.lib.vcs.nodes import FileNode
from kallithea.controllers.changelog import _load_changelog_summary
log = logging.getLogger(__name__)
README_FILES = [''.join([x[0][0], x[1][0]]) for x in
sorted(list(itertools.product(ALL_READMES, ALL_EXTS)),
key=lambda y:y[0][1] + y[1][1])]
class SummaryController(BaseRepoController):
def __before__(self):
super(SummaryController, self).__before__()
def __get_readme_data(self, db_repo):
repo_name = db_repo.repo_name
log.debug('Looking for README file')
@cache_region('long_term', '_get_readme_from_cache')
def _get_readme_from_cache(key, kind):
readme_data = None
readme_file = None
try:
# gets the landing revision! or tip if fails
cs = db_repo.get_landing_changeset()
if isinstance(cs, EmptyChangeset):
raise EmptyRepositoryError()
renderer = MarkupRenderer()
for f in README_FILES:
try:
readme = cs.get_node(f)
if not isinstance(readme, FileNode):
continue
readme_file = f
log.debug('Found README file `%s` rendering...',
readme_file)
readme_data = renderer.render(readme.content,
filename=f)
break
except NodeDoesNotExistError:
continue
except ChangesetError:
log.error(traceback.format_exc())
pass
except EmptyRepositoryError:
pass
return readme_data, readme_file
kind = 'README'
valid = CacheInvalidation.test_and_set_valid(repo_name, kind)
if not valid:
region_invalidate(_get_readme_from_cache, None, '_get_readme_from_cache', repo_name, kind)
return _get_readme_from_cache(repo_name, kind)
@LoginRequired()
@HasRepoPermissionLevelDecorator('read')
def index(self, repo_name):
_load_changelog_summary()
if request.authuser.is_default_user:
username = ''
else:
username = safe_str(request.authuser.username)
_def_clone_uri = _def_clone_uri_by_id = c.clone_uri_tmpl
if '{repo}' in _def_clone_uri:
_def_clone_uri_by_id = _def_clone_uri.replace('{repo}', '_{repoid}')
elif '{repoid}' in _def_clone_uri:
_def_clone_uri_by_id = _def_clone_uri.replace('_{repoid}', '{repo}')
c.clone_repo_url = c.db_repo.clone_url(user=username,
uri_tmpl=_def_clone_uri)
c.clone_repo_url_id = c.db_repo.clone_url(user=username,
uri_tmpl=_def_clone_uri_by_id)
if c.db_repo.enable_statistics:
c.show_stats = True
else:
c.show_stats = False
stats = Statistics.query() \
.filter(Statistics.repository == c.db_repo) \
.scalar()
c.stats_percentage = 0
if stats and stats.languages:
c.no_data = False is c.db_repo.enable_statistics
lang_stats_d = json.loads(stats.languages)
lang_stats = ((x, {"count": y,
"desc": LANGUAGES_EXTENSIONS_MAP.get(x)})
for x, y in lang_stats_d.items())
c.trending_languages = (
sorted(lang_stats, reverse=True, key=lambda k: k[1])[:10]
)
else:
c.no_data = True
c.trending_languages = []
c.enable_downloads = c.db_repo.enable_downloads
c.readme_data, c.readme_file = \
self.__get_readme_data(c.db_repo)
return render('summary/summary.html')
@LoginRequired()
@NotAnonymous()
@HasRepoPermissionLevelDecorator('read')
@jsonify
def repo_size(self, repo_name):
if request.is_xhr:
return c.db_repo._repo_size()
else:
raise HTTPBadRequest()
@LoginRequired()
@HasRepoPermissionLevelDecorator('read')
def statistics(self, repo_name):
if c.db_repo.enable_statistics:
c.show_stats = True
c.no_data_msg = _('No data ready yet')
else:
c.show_stats = False
c.no_data_msg = _('Statistics are disabled for this repository')
td = date.today() + timedelta(days=1)
td_1m = td - timedelta(days=calendar.mdays[td.month])
td_1y = td - timedelta(days=365)
ts_min_m = mktime(td_1m.timetuple())
ts_min_y = mktime(td_1y.timetuple())
ts_max_y = mktime(td.timetuple())
c.ts_min = ts_min_m
c.ts_max = ts_max_y
stats = Statistics.query() \
.filter(Statistics.repository == c.db_repo) \
.scalar()
c.stats_percentage = 0
if stats and stats.languages:
c.no_data = False is c.db_repo.enable_statistics
lang_stats_d = json.loads(stats.languages)
c.commit_data = stats.commit_activity
c.overview_data = stats.commit_activity_combined
lang_stats = ((x, {"count": y,
"desc": LANGUAGES_EXTENSIONS_MAP.get(x)})
for x, y in lang_stats_d.items())
c.trending_languages = (
sorted(lang_stats, reverse=True, key=lambda k: k[1])[:10]
)
last_rev = stats.stat_on_revision + 1
c.repo_last_rev = c.db_repo_scm_instance.count() \
if c.db_repo_scm_instance.revisions else 0
if last_rev == 0 or c.repo_last_rev == 0:
pass
else:
c.stats_percentage = '%.2f' % ((float((last_rev)) /
c.repo_last_rev) * 100)
else:
c.commit_data = {}
c.overview_data = ([[ts_min_y, 0], [ts_max_y, 10]])
c.trending_languages = {}
c.no_data = True
recurse_limit = 500 # don't recurse more than 500 times when parsing
get_commits_stats(c.db_repo.repo_name, ts_min_y, ts_max_y, recurse_limit)
return render('summary/statistics.html')
|