From 842107cb826026a358c7b31ec84716abaf7f3d18 2013-11-28 21:29:55 From: Branko Majic Date: 2013-11-28 21:29:55 Subject: [PATCH] PYD-2: Added privacy chapter and release notes. --- diff --git a/docs/index.rst b/docs/index.rst index 539c3e3bb2e7b766f683866605f6079243cd7cbc..de661bc27a0077ebce23666b8eb664a10213ca9c 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -28,7 +28,7 @@ Contents: installation usage algorithm - security + privacy apireference releasenotes diff --git a/docs/privacy.rst b/docs/privacy.rst new file mode 100644 index 0000000000000000000000000000000000000000..7931fd0f4184f0fe4abc63adce338839313cd31e --- /dev/null +++ b/docs/privacy.rst @@ -0,0 +1,63 @@ +Privacy +======= + +It is fundamentally important to understand the privacy issues if using +Pydenticon in order to generate uniquelly identifiable avatars for users leaving +the comments etc. + +The most common way to expose the identicons is by having a web application +generate them on the fly from data that is being passed to it through HTTP GET +requests. Those GET requests would commonly include either the raw data, or data +as hex string that is then used to generate an identicon. The URLs for GET +requests would most commonly be made as part of image tags in an HTML page. + +The data passed needs to be unique in order to generate distinct identicons. In +most cases the data used will be either name or e-mail address that the visitor +posting the comment fills-in in some field. That being said, e-mails usually +provide a much better identifier than name (especially if the website verifies +the comments through by sending-out e-mails). + +Needless to say, in such cases, especially if the website where the comments are +being posted is public, using raw data can completely reveal the identity of the +user. If e-mails are used for generating the identicons, the situation is even +worse, since now those e-mails can be easily harvested for spam purposes. Using +the e-mails also provides data mining companies with much more reliable user +identifier that can be coupled with information from other websites. + +Therefore, it is highly recommended to pass the data to web application that +generates the identicons using **hex digest only**. I.e. **never** pass the raw +data. + +Although passing hash instead of real data as part of the GET request is a good +step forward, it can still cause problems since the hashses can be collected, +and then used in conjunction with rainbow tables to identify the original +data. This is particularly problematic when using hex digests of e-mail +addresses as data for generating the identicon. + +There's two feasible approaches to resolve this: + +* Always apply *salt* to user-identifiable data before calculating a hex + digest. This can hugely reduce the efficiency of brute force attacks based on + rainbow tables (althgouh it will not mitigate it completely). +* Instead of hashing the user-identifiable data itself, every time you need to + do so, create some random data instead, hash that random data, and store it + for future use (cache it), linking it to the original data that it was + generated for. This way the hex digest being put as part of an image link into + HTML pages is not derived in any way from the original data, and can therefore + not be used to reveal what the original data was. + +Keep in mind that using identicons will inevitably still allow people to track +someone's posts across your website. Identicons will effectively automatically +create pseudonyms for people posting on your website. If that may pose a +problem, it might be better not to use identicons at all. + +Finally, small summary of the points explained above: + +* Always use hex digests in order to retrieve an identicon from a server. +* Instead of using privately identifiable data for generating the hex digest, + use randmoly generated data, and associate it with privately identifiable + data. This way hex digest cannot be traced back to the original data through + brute force or rainbow tables. +* If unwilling to generate and store random data, at least make sure to use + salt when hashing privately identifiable data. + diff --git a/docs/releasenotes.rst b/docs/releasenotes.rst new file mode 100644 index 0000000000000000000000000000000000000000..6d362b1edf4723ff2a612aa10b6057dc547a7343 --- /dev/null +++ b/docs/releasenotes.rst @@ -0,0 +1,23 @@ +Release Notes +============= + +0.1 +--- + +Initial release of Pydenticon. Implemented features: + +* Supported parameters for identicon generator (shared between multiple + identicons): + * Number of blocks in identicon (rows and columns). + * Digest algorithm. + * List of foreground colours to choose from. + * Background colour. +* Supported parameters when generating induvidual identicons: + * Data that should be used for identicon generation. + * Width and height of resulting image in pixels. + * Padding around identicon (top, bottom, left, right). + * Output format. + * Inverted identicon (swaps foreground with background). +* Support for PNG and ASCII format of resulting identicons. +* Full documentation covering installation, usage, algorithm, privacy. API + reference included as well.