Destroying Instrument Data Uploaded by the Data Mover

Introduction

This page is for instrument operators.

The Data Mover uploads data to your instrument project and then shareable links are sent by email to the end user so that they can access their data (download or copy to their own Mediaflux project).  Those shareable links have a finite lifetime (which you, the platform operator  decides upon), typically 30 to 90 days.

The business policy for your instrument platform may be that you do not hold the data long term as that may be the responsibility of the user, not the platform.  In this case, you will want/need to destroy the data after some time, typically after the download links have expired.

If your project data are also replicated to the Disaster Recovery server (Noble Park), then they need to be destroyed there also.

Destroying Primary Data

We have developed the following process that allows you, the platform operator (the only user with direct access to the instrument project) to destroy data after the download links have expired.  If you would like this process implemented for your platform please contact RCS/DST via Service Now as per usual.

  • We set meta-data on specific (by request) instrument upload projects that indicates this project should be included in the process to assess if there are data ready to destroy. This meta-data includes an email address to send notifications to.
  • A scheduled job executes in the Mediaflux server on a regular basis and it:
    • Looks for data uploaded by the Data Mover that is older than the lifetime of the download links (e.g. 30 days)
      • The process looks everywhere in your project but can exclude specific nominated child folders (also set as meta-data on the project)
    • If it finds any, it sends an email to the notification recipient (you, the platform operator) with a URL to a web page.
      • You authenticate to Mediaflux via the web page (so you have authority to destroy the data) 
      • This presents a simple interface with which you can destroy expired data.
      • The interface includes whether the end user has downloaded their data or not by one of the three delivery methods (downloaded via the Data Mover, copied to own Mediaflux project or direct download attempt [we can't know if the attempt succeeded])
        • By default, data ready for destroy that has been downloaded by any of the three mechanisms is checked (if not downloaded not checked) in the interface (see below).


Here is an example with some test data.   You can see the first two namespaces have been downloaded, so they are checked. You can sort by the columns and filter). 


When you click the Delete button a new GUI will appear as below in which you can confirm the deletion.  The data are deleted asynchronously and you can receive an email notification when its complete so you can logout straightaway.




Destroying Replicated Data 

The primary purpose of the Disaster Recovery (DR) system is to hold copies of data in case it is mistakenly or maliciously destroyed on the primary.  Data are replicated in quasi real time to the DR system and end users have no access to the DR system.  The Mediaflux support team will only destroy data on the DR server with the explicit, traceable permission of the owner.  

Because end users have no access to the DR server, a separate process is needed so that when primary instrument data are destroyed (as above), the same thing happens robustly on the DR system.    If you would like this process implemented for your platform please contact RCS/DST via Service Now as per usual.

The process is:

  • Execute a scheduled job regularly (monthly) that looks for replica assets in transactional namespaces on the DR server that:
    • Belong to nominated projects (the same ones as for destroys on the primary system)
    • Were created by the Data Mover for an instrument upload
    • Are ready to be destroyed because the assets are older than the expiry of the download links (same as for primary)
    • No longer exist on the primary (indicating they have been destroyed)
    • Destroys those data. This destroy process is actually in two steps:
      • First, actually move the data elsewhere
      • Second, actually destroy the data 30 days later