pydenticon File Diff

File diff 5df4ce2fcaa9 → 842107cb8260

docs/privacy.rst

➞

Show inline comments

@@ new file 100644 @@
 Privacy
 =======
 It is fundamentally important to understand the privacy issues if using
 Pydenticon in order to generate uniquelly identifiable avatars for users leaving
 the comments etc.
 The most common way to expose the identicons is by having a web application
 generate them on the fly from data that is being passed to it through HTTP GET
 requests. Those GET requests would commonly include either the raw data, or data
 as hex string that is then used to generate an identicon. The URLs for GET
 requests would most commonly be made as part of image tags in an HTML page.
 The data passed needs to be unique in order to generate distinct identicons. In
 most cases the data used will be either name or e-mail address that the visitor
 posting the comment fills-in in some field. That being said, e-mails usually
 provide a much better identifier than name (especially if the website verifies
 the comments through by sending-out e-mails).
 Needless to say, in such cases, especially if the website where the comments are
 being posted is public, using raw data can completely reveal the identity of the
 user. If e-mails are used for generating the identicons, the situation is even
 worse, since now those e-mails can be easily harvested for spam purposes. Using
 the e-mails also provides data mining companies with much more reliable user
 identifier that can be coupled with information from other websites.
 Therefore, it is highly recommended to pass the data to web application that
 generates the identicons using **hex digest only**. I.e. **never** pass the raw
 data.
 Although passing hash instead of real data as part of the GET request is a good
 step forward, it can still cause problems since the hashses can be collected,
 and then used in conjunction with rainbow tables to identify the original
 data. This is particularly problematic when using hex digests of e-mail
 addresses as data for generating the identicon.
 There's two feasible approaches to resolve this:
 * Always apply *salt* to user-identifiable data before calculating a hex
   digest. This can hugely reduce the efficiency of brute force attacks based on
   rainbow tables (althgouh it will not mitigate it completely).
 * Instead of hashing the user-identifiable data itself, every time you need to
   do so, create some random data instead, hash that random data, and store it
   for future use (cache it), linking it to the original data that it was
   generated for. This way the hex digest being put as part of an image link into
   HTML pages is not derived in any way from the original data, and can therefore
   not be used to reveal what the original data was.
 Keep in mind that using identicons will inevitably still allow people to track
 someone's posts across your website. Identicons will effectively automatically
 create pseudonyms for people posting on your website. If that may pose a
 problem, it might be better not to use identicons at all.
 Finally, small summary of the points explained above:
 * Always use hex digests in order to retrieve an identicon from a server.
 * Instead of using privately identifiable data for generating the hex digest,
   use randmoly generated data, and associate it with privately identifiable
   data. This way hex digest cannot be traced back to the original data through
   brute force or rainbow tables.
 * If unwilling to generate and store random data, at least make sure to use
   salt when hashing privately identifiable data.