Last tended on 11 December, 2021 (first created 19 May, 2018)

,

What this is

You are at https://www.zylstra.org/blog, which since 2002 is the personal weblog of me, Ton Zijlstra, its author. Although personal weblogs aren’t subject to the GDPR (the European personal data protection regulations), I do write about my professional interests here, and one of those is data protection. So I added a data protection policy anyway. My contact info is listed in the right hand column.

What personal data my site collects and why

When you visit this site, some technical data is automatically collected, such as your IP address. This is used for anti-spam, security and a few very basic analytical purposes. When you comment on a posting, a name and email address will be asked. When your own website alerts my website that you link to me (Webmention), your name and website address may appear in my comment section. In some postings other website’s content may be embedded (like a Slideshare presentation, a Youtube video, or an image on Flikcr), that track some of your data themselves.

Comments and Webmentions

When visitors leave comments on the site the data shown in the comments form is collected (name and email address), and also the visitor’s IP address and browser user agent string to help spam detection. The name you use in the comment form is shown publicly on the website once your comment is approved.

The email address you provided will not be published, but will be stored with your comment, for as long as that comment is published. An anonymized string created from your email address (also called a hash) may be provided to the Gravatar service to see if you are using it. The Gravatar service privacy policy is available here: https://automattic.com/privacy/. After approval of your comment, your Gravatar profile picture is visible to the public in the context of your comment.

Webmentions are an automatic way in which your own website alerts my website that you link to it. Metadata in your own website’s markup explicitly makes that data available to my website. I only publish metadata, such as your name, url or profile picture of your site, that you yourself submit, underneath my own postings. I only publish a link to your own website along the lines of “this article was mentioned on [your website]“, and no excerpt or fragment of your content will be displayed unless it’s a direct reply. I also use webmention to collect and display social backfeeds, which are mentions and likes on Twitter and Mastodon that are walled gardens and do not themselves support webmention. I use the WordPress plugin Webmention for this.

Subscriptions

You have the option to subscribe by e-mail to new postings. Those subscriptions are managed by WordPress.com. The e-mail addresses are not used for anything else. I do occasionally clean up the list removing e-mail addresses that are connected to spammers.

Contact forms

There is no contact form, so no data is collected there. My contact info is listed on the right hand side.

Cookies

If you leave a comment on our site you may opt-in to saving your name, email address and website in cookies stored on your own computer. These are for your convenience so that you do not have to fill in your details again when you leave another comment. These cookies will last for one year. You can delete these cookies from your browser anytime if you want.

My blog does not set any other cookies.

Embedded content from other websites

Articles on this site may include embedded content (e.g. videos from Youtube, images from Flickr, articles and slides from Slideshare, etc.). Embedded content from other websites behaves in the exact same way as if you have visited the other website.

These websites may collect data about you, use cookies, embed additional third-party tracking, and monitor your interaction with that embedded content, including tracing your interaction with the embedded content if you have an account and are logged in to that website. Youtube videos that I embed are embedded in privacy enhanced mode, meaning they use a non-tracking Youtube URL (youtube-nocookie.com). However it is at the discretion of YT/Google if they adhere to their own promises on this. I also strip any script from Flickr embeds, resulting in as far as I can tell in no cookies and tracking being set, except for loading the image from your IP address. I no longer embed content from Slideshare / Scribd, and am self hosting such files.

Analytics

I don’t use specialized analytical tools. However data such as your IP address, and the pages an IP has requested are stored in the server logs of my web hosting company, Your-Webhost. Their data protection policy is at https://www.your-webhost.nl/whois/terms.html. Whenever there are server problems, I may ask my hosting provider to look into their logs to see what happened. The server logs are processed on my webserver into aggregated analytical data with a tool called Awstats, that is available by default from my hosting company. I never look at it, though that may change.

By default WordPress, the tool I use to make this site, does not collect any analytics data. Jetpack is a plugin by Automattic, the creator of WordPress, that is part of WordPress by default and does collect analytics data, but I have uninstalled it.

Who I share your data with

I don’t share your data (the little that I may have) with others, except for the plugins that I use for spam and malicious attack protection.

How I retain your data

If you leave a comment, the comment and its metadata are retained indefinitely. The same is true for Webmentions. This is so I can recognize and approve any follow-up comments automatically instead of holding them in a moderation queue.

If you subscribe to my blog by e-mail, I retain the e-mail address you used until you unsubscribe.

Aggregated statistics in Awstats are kept for 5 years maximum, although I usually delete them much earlier to free up space on my hosting account.

What rights you have over your data

If you have an account on this site (you don’t, only I do), or have left comments, you can request to receive an exported file of the personal data I hold about you, including any data you have provided to me. You can also request that I correct or erase any personal data I hold about you. This does not include erasure of any data I may be obliged to keep for administrative, legal, security or other legitimate purposes. You can also at any time request the removal of one or all webmentions originating from your website.

Where I send your data

Visitor comments and visitor’s IP addresses are checked through an automated spam and attack detection service. I use Wordfence and Akismet for this.

My contact information

You can contact me using the information on the right hand side. You can use encrypted email to do so.

How I protect your data

All interaction with this website is encrypted traffic, by using https. My webserver, on which all data for this blog is stored, is protected by my web hosting company Your-Webhost. I cannot circumvent or alter their protective measures, nor do so without breaching their terms of service. My own access to this website, the back-end at my hosting company, and the front-end WordPress, is protected with strong passwords and non-standard usernames. I use two plugins, Akismet and Wordfence to shield against spam and attacks.

What data breach procedures I have in place

If you think data on this site may have been breached please contact me. With my web-hosting provider I will look into it, and report back to you.
If I get notified about a breach by my web-hoster I will inform those that have commented, and will post an announcement in my blog itself.
If I suspect there may have been a breach I will notify my web-hosting provider and work with them to prevent futures breaches, inform those who have commented on my site and post an announcement in my blog itself.

What automated decision making and/or profiling I do with user data

If you submit a comment to this site, or if you try to gain access to this website’s controls, you may be automatically classified as spammer or a malicious attacker and automatically blocked or listed as permanently banned. If you submit a comment for the first time, or a comment that contains multiple weblinks, it will be automatically held for moderation, and will not be published until I have looked at it. If you have previously approved comments published on my blog, and I know you, you will be automatically permitted to do so again using the same credentials.

4 reactions on “Personal Data Protection Policy

,

  1. December always seems to be the season of increased and novel forms of email spam in my inbox. As if they’re hoping my spam filters will take time off, or something.
    This year’s personal novelty in my inbox is what seems a trolling attempt w.r.t. the EU data protection regulation (GDPR) and the similar Californian consumer privacy act (CCPA).
    Yesterday I received an email titled “Questions About GDPR Data Access Process for zylstra.org” sent from an address that has left no previous online search traces, and for which the domain name was first registered in March 2021. The sender’s domain envoiemail.fr looks set up specifically for this. The name used seems fake (no one in the world has that name if I’m to believe Google, LinkedIn et al).
    The mail reads:

    To Whom It May Concern:
    My name is … , and I am a resident of Paris, France. I have a few questions about your process for responding to General Data Protection Regulation (GDPR) data access requests:
    Do you process GDPR data access requests via email, a website, or telephone? If via a website, what is the URL I should go to?
    What personal information do I have to submit for you to verify and process a GDPR data access request?
    What information do you provide in response to a GDPR data access request?
    To be clear, I am not submitting a data access request at this time. My questions are about your process for when I do submit a request.
    Thank you in advance for your answers to these questions. If there is a better contact for processing GDPR requests regarding zylstra.org, I kindly ask that you forward my request to them.
    I look forward to your reply without undue delay and at most within one month of this email, as required by Article 12 of GDPR.
    Sincerely,

    That last bit about Article 12 and having a month to reply, seems ominous but in my reading of the GDPR only concerns actual data access requests.
    When I received that mail it appeared fake to me, mostly because it’s boilerplate text without context about me as the receiver and using the domain name as some sort of organisation name. I replied nonetheless, which I probably shouldn’t have, with a single line message that my private website doesn’t fall within scope of the GDPR. I do have a GDPR policy page out of professional interest in the subject matter.
    Then today I received another mail. This time concerning the Californian Consumer Privacy Act (CCPA), which is a data protection act modelled on the EU GDPR. The text was the same, the name used was different but also fake / trace-less online, the sender’s domain name (potomacmail.com) was registered in March 2020 and like the previous one pretends to be an e-mail service (but one whose online traces are all blogposts like mine outing it as some sort of scam attempt). The mail reads the same as the first one:

    To Whom It May Concern:
    My name is …, and I am a resident of Norfolk, Virginia. I have a few questions about your process for responding to California Consumer Privacy Act (CCPA) data access requests:
    Would you process a CCPA data access request from me even though I am not a resident of California?
    Do you process CCPA data access requests via email, a website, or telephone? If via a website, what is the URL I should go to?
    What personal information do I have to submit for you to verify and process a CCPA data access request?
    What information do you provide in response to a CCPA data access request?
    To be clear, I am not submitting a data access request at this time. My questions are about your process for when I do submit a request.
    Thank you in advance for your answers to these questions. If there is a better contact for processing CCPA requests regarding zylstra.org, I kindly ask that you forward my request to them.
    I look forward to your reply without undue delay and at most within 45 days of this email, as required by Section 1798.130 of the California Civil Code.
    Sincerely,
    ….

    Needless to say, this blog is not within scope of the CCPA.
    Both domain names used, envoiemail.fr and potomacmail.com show the same message if you visit the domains. Judging by the mail headers they use Amazon simple e-mail services.

    What would be the purpose of such spam messages. The blogpost I linked to says there was a tracking pixel in the mail they received but I don’t see that in my mail’s source. The hard thing is I now have to wait 30 and 45 days according to these mails to see if there’s a follow-up.

  2. The spam about GDPR and CCPA I received last week, turns out to be part of a study by the US based Princeton university, with one of the researchers recently having joined the Dutch Radboud University. The more recently sent out mails apparantly had a link to the project page added, I assume in light of feedback received, which then was shared in my Mastodon timeline by someone who as a Mastodon moderator had received these mails.
    I sent a mail to the research team explaining my complaint about the mails I received. I also approached the Radboud University’s Digital Security (RU DiS) research group where one of the researchers works, and filed a complaint there.
    In the past few days I’ve had e-mail exchanges with the research team, as well as with the RU DiS department head. All those I approached have been very responsive and willing to provide information, which I very much appreciate.
    That doesn’t make the mails I received ok though. The research team itself may have come to the same notion, as they informed me they’ve stopped sending out new mails for now. They are also working to add have added a FAQ to the project page. [UPDATE 2021-12-19 Jonathan Mayer, the Principal Investigator in this Princeton research project has now issued an apology. These are welcome words.]
    On the research
    The research project is interested in how companies have set up their process for responding to requests for data access under the European general data protection regulation (GDPR) and the California Consumer Privacy Act (CCPA). They also intended these requests for organisations who don’t a priori fall within scope of those acts. Both acts are intended to set a norm for those not covered by it. The GDPR is written to export the EU’s norms for data protection to the rest of the world, and the CCPA is set up to encourage companies not active in California to follow its rules regardless. So far I have no issues.
    How I ended up in the list of sites approached
    My blog is a personal website, so it falls outside of the declared scope of the study (companies). It can’t fall under the CCPA, as it only applies to businesses (that do business in California, with a certain turnover, or selling data). It is less clear if it falls under the GDPR: In my reading of the GDPR it doesn’t, but at the same time have written a personal data protection policy as if it does (out of professional interest). So how did I end up in Princeton’s list of site owners to approach? In my conversation with one of the researchers they indicated that the list of sites to approach was a selection taken out of the Tranco list. That list combines the results from various lists of the 1 million most popular websites. Such as Alexa (soon to be discontinued), Cisco Umbrella, and Majestic Million. My URL is in both the Alexa and the Majectic list. Cisco’s list looks at DNS requests for domains on their hardware, and unsurprisingly I’m not in their current list as it is based on today’s web traffic. The Majestic list seems to use backlinks to a site as a ranking factor. This favors old websites, as they build up a sediment of such backlinks over time. Such as weblogs that are some 20 years old, such as mine. Unsurprising then that blogs like Dave‘s, David‘s, and those of longtime blogging friends feature in the list. In the graph below you see my and their blogs as they rank in the Tranco list.
    The relative positions of the blogs of several old time blogging friends and myself in the Tranco list of over 1 million sites.
    That I might be on the long list when the Tranco list is used makes sense. However the research group says they used filtering and categorisation to then select the websites to approach. A meaningful selection seems less likely, given that they approached personal sites like mine (and judging by other sites approached as apparent from other online comments on the mails sent).
    Still it’s wrong
    The research was designed by Princeton’s computer science department, and was discussed with Princeton’s Institutional Review Board (IRB) they say. During this process the team ‘extensively discussed potential risks of our study, and took measures to minimize undue burden on websites, especially websites with less traffic and resources’.
    The IRB concluded the research doesn’t constitute human subject research. True, from a design perspective, but as shown by me as a private individual receiving their e-mails not true in practice. Better determination of which sites to approach and not to approach would have been needed for that.
    The e-mails sent out for this study are also worryingly problematic in two aspects:
    First they pretend to be actual e-mails by individuals, nowhere is made clear it’s research. On top of that the names used for these individuals are clearly fake, and the domains from which e-mails were sent also easily raise suspicion. Furthermore the request lacks any context, an individual with a real request would never use a generic text or use the domain name and not the actual name of a website. This makes it unclear to recipients what the very purpose of the e-mails is. This is not only true for individuals or e.g. small non-profits, this is confusing and suspicious to every recipient even if they had limited their inquiries to major corporations. I’m sure that negatively impacts the results, and thus the validity of conclusions. It also means many recipients will have spent time evaluating, or worse bringing in advice, on how to deal with these suspicious looking requests.
    Second the wording of the e-mail makes it worse. The mails have a legalese ring to them (e.g. stating it is not a formal data access request at this time though it might still follow, another thing a real individual would not phrase like that). What is worse each mail suggests a legal threat at the end. They say that a response is required within a month based on Article 12 of the GDPR, or within 45 days based on Section 1798.130 of the California Civil Code. Both those statements are lies. Art 12 GDPR sets a response deadline for data access requests, which this mail emphasises it is not, and the same is true for the California Civil Code.
    It’s exactly this wording, with false legal threats, and lacking any context to evaluate what the purpose of the e-mails is, that makes people worry, spend time or even money figuring out what they might be exposed to. As an individual I concluded to ignore the mails, others didn’t, but would you if you are a small non-profit, or other business that does not have the inhouse legal knowledge to deal with this? Precisely those who have some knowledge about the GDPR or CCPA but not enough to be fully sure of themselves will spend unnecessary time on these requests. Princeton is thus externalising a burden and cost on website owners. Falsifying the very thing Princeton states about aiming to “minimize undue burden on websites“. Using the word websites obfuscates that every mail will have to be answered by a real person. They could have just mailed me asking me straight up for their research if I have a process for the GDPR in place. I would have replied to them and be done with it.
    Filed complaint
    Originally I had filed a complaint with the Digital Security research team at Radboud University, as they are named as partners in the study. Yesterday I withdrew my complaint with them, as they weren’t part of the study design, just have recently hired one of the researchers involved. Nevertheless they informed me they have alerted their own ethics board about this, to take lessons from it w.r.t guidelines and good practices, even as the head of department said to me it is now too late to prevent damage. At the same time he wrote, they cannot let it pass because “Even if privacy researchers do these projects with the best of intentions, it doesn’t mean they aren’t required to set them up well”.
    It also means that I will refile my complaint with Princeton’s Review Board. Meanwhile this has spilled out online (it’s what you get if you target the 1 million most popular websites…), and I am not the only one filing a complaint judging by the responses of a tonedeaf tweet by one of the researchers.
    Others blogging about this study:
    Questions About GDPR Data Access Process Spam from Virginia
    Free Radical: CCPA Scam
    What’s the deal with those weird GDPR emails?
    I Was Part of a Human Subject Research Study Without My Consent

Comments are closed.

Mentions