Changeset - 842107cb8260
[Not reviewed]
0 1 2
Branko Majic (branko) - 10 years ago 2013-11-28 21:29:55
branko@majic.rs
PYD-2: Added privacy chapter and release notes.
3 files changed with 87 insertions and 1 deletions:
0 comments (0 inline, 0 general)
docs/index.rst
Show inline comments
 
@@ -28,7 +28,7 @@ Contents:
 
   installation
 
   usage
 
   algorithm
 
   security
 
   privacy
 
   apireference
 
   releasenotes
 

	
docs/privacy.rst
Show inline comments
 
new file 100644
 
Privacy
 
=======
 

	
 
It is fundamentally important to understand the privacy issues if using
 
Pydenticon in order to generate uniquelly identifiable avatars for users leaving
 
the comments etc.
 

	
 
The most common way to expose the identicons is by having a web application
 
generate them on the fly from data that is being passed to it through HTTP GET
 
requests. Those GET requests would commonly include either the raw data, or data
 
as hex string that is then used to generate an identicon. The URLs for GET
 
requests would most commonly be made as part of image tags in an HTML page.
 

	
 
The data passed needs to be unique in order to generate distinct identicons. In
 
most cases the data used will be either name or e-mail address that the visitor
 
posting the comment fills-in in some field. That being said, e-mails usually
 
provide a much better identifier than name (especially if the website verifies
 
the comments through by sending-out e-mails).
 

	
 
Needless to say, in such cases, especially if the website where the comments are
 
being posted is public, using raw data can completely reveal the identity of the
 
user. If e-mails are used for generating the identicons, the situation is even
 
worse, since now those e-mails can be easily harvested for spam purposes. Using
 
the e-mails also provides data mining companies with much more reliable user
 
identifier that can be coupled with information from other websites.
 

	
 
Therefore, it is highly recommended to pass the data to web application that
 
generates the identicons using **hex digest only**. I.e. **never** pass the raw
 
data.
 

	
 
Although passing hash instead of real data as part of the GET request is a good
 
step forward, it can still cause problems since the hashses can be collected,
 
and then used in conjunction with rainbow tables to identify the original
 
data. This is particularly problematic when using hex digests of e-mail
 
addresses as data for generating the identicon.
 

	
 
There's two feasible approaches to resolve this:
 

	
 
* Always apply *salt* to user-identifiable data before calculating a hex
 
  digest. This can hugely reduce the efficiency of brute force attacks based on
 
  rainbow tables (althgouh it will not mitigate it completely).
 
* Instead of hashing the user-identifiable data itself, every time you need to
 
  do so, create some random data instead, hash that random data, and store it
 
  for future use (cache it), linking it to the original data that it was
 
  generated for. This way the hex digest being put as part of an image link into
 
  HTML pages is not derived in any way from the original data, and can therefore
 
  not be used to reveal what the original data was.
 

	
 
Keep in mind that using identicons will inevitably still allow people to track
 
someone's posts across your website. Identicons will effectively automatically
 
create pseudonyms for people posting on your website. If that may pose a
 
problem, it might be better not to use identicons at all.
 

	
 
Finally, small summary of the points explained above:
 

	
 
* Always use hex digests in order to retrieve an identicon from a server.
 
* Instead of using privately identifiable data for generating the hex digest,
 
  use randmoly generated data, and associate it with privately identifiable
 
  data. This way hex digest cannot be traced back to the original data through
 
  brute force or rainbow tables.
 
* If unwilling to generate and store random data, at least make sure to use
 
  salt when hashing privately identifiable data.
 

	
docs/releasenotes.rst
Show inline comments
 
new file 100644
 
Release Notes
 
=============
 

	
 
0.1
 
---
 

	
 
Initial release of Pydenticon. Implemented features:
 

	
 
* Supported parameters for identicon generator (shared between multiple
 
  identicons):
 
  * Number of blocks in identicon (rows and columns).
 
  * Digest algorithm.
 
  * List of foreground colours to choose from.
 
  * Background colour.
 
* Supported parameters when generating induvidual identicons:
 
  * Data that should be used for identicon generation.
 
  * Width and height of resulting image in pixels.
 
  * Padding around identicon (top, bottom, left, right).
 
  * Output format.
 
  * Inverted identicon (swaps foreground with background).
 
* Support for PNG and ASCII format of resulting identicons.
 
* Full documentation covering installation, usage, algorithm, privacy. API
 
  reference included as well.
0 comments (0 inline, 0 general)