Primary Data WorkFlow
This page describes the primary workflow that utilises the Data Mover
tool to upload data from an instrument to Mediaflux
and then dispatches the data to the User.
There are two distinct components to the WorkFlow:
- Upload data to a
Mediaflux Instrument Project
from an instrument acquisition computer and notify platform instrument staff.- This is achieved via the
Data Mover
GUI that is operated by either platform staff or trusted power users. TheData Mover
is pre-configured with theMediaflux
Instrument Project
destination. - The person using the GUI selects the data to upload and enters an end-user email address (for dispatch of the data to that
User
) - The person using the GUI activates the upload which sends the selected data to the
Mediaflux Instrument Project
- Platform staff are notified by email of the successful upload
- This is achieved via the
- Provide further dispatch of that data (and only that data) to an end user by allowing them to use one of the following methods:
- Download it (from the
Instrument Project)
to their local storage via aDownload Shareable
(received by the email address provided in step 1b) and theData Mover.
This method is best for big data (> tens of GB). - Download it (from the
Instrument Project)
to their local storage via a direct shareable link (received by the email address provided in step 1b). This link does not require theData Mover
and just downloads the data directly into a zip file. This method is optimal for small data (< tens of GB) - Copy it to their own
Mediaflux Project
f(rom theInstrument Project
) via a web-based GUI (URL received by the email address provided in step 1b). This step requires theUser
to log in to Mediaflux. - End user and platform staff can be notified when these transactions complete
- Download it (from the
With this process, Platform Instrument staff
- can be confident that the data have securely and robustly reached their
Mediaflux Instrument Project
and the end user - do not need to play any time-consuming role in managing the dispatch of data to the end user
Figure 1 - A schematic of the key data flow components The user at the Instrument utilises the GUI of the pre-configured Data Mover
to upload data to the Mediaflux Instrument Project.
With the Download Shareable
, received by email, the recipient User
can download the data (and only the data just uploaded) from the Instrument Project
to their local workstation. Only a Mediaflux User
will receive the WEB URL in the email, and can also copy the data to their own Mediaflux Project
. The notification email (containing the Download Shareable
and optionally the WEB URL) that the User
receives from Mediaflux is not shown in this diagram.
Transactions
When data are uploaded to the Mediaflux Instrument Project
, an additional namespace (folder/directory) is added at the top of the upload. This is named with the date and time of the oldest file uploaded (the date and time of when the upload was done is less useful). This approach, where this extra layer is inserted is referred to as 'Transactional'. Every upload is a transaction. If you upload the same Instrument data twice, you will get two transactions and two distinct uploads of the same data.
This approach allows platform staff to be clear about what precisely has been uploaded (if the uploads could be updated, there would be no clarity for platform staff about what has been stored in the Instrument Project
). When the upload to the Instrument Project
is complete, a Manifest Asset
is also created in the top-level parent transactional namespace in the Mediaflux Instrument Project.
That asset contains a range of information about the upload and it can be viewed and queried for.
For reference, the complete form of the transaction parent namespace into which data are uploaded is described here.
Transiently Held Data
It is quite common for instrument platforms to hold the data in Mediaflux only for a short-period of time (the lifetime of the downloads links) and then destroy it. In that time, the researcher can download/copy the data as it is ultimately their responsibility to manage it, not the platform's. For maximum data security, the platform may still wish to replicate the data to the Noble Park data centre to our Disaster Recovery (DR) Mediaflux system. We have developed a process so that
- It's easy for platform operators to destroy expired (i.e. data older than the lifetime of the download links) data via a web-based dashboard
- The replicas on the DR system will also get destroyed some time (currently 30 days) after the primary copies are destroyed by the operator
If you interested in this capability, please contact the Data Solutions team.
Mapping Source Paths to Mediaflux Paths
By default, the unique transactional folder (see above) created for instrument uploads is located under a specific parent folder in the instrument project. That parent can be anywhere, but it's fixed by the shareable. Sometimes however, users prefer a structure which has some relationship to the source file system structure.
Let's show by example. Let us say that your data are being uploaded to the parent folder /projects/proj-neil-1128.4.1000/DM-uploads.
By default, the upload process will create a transactional folder and then locate the data in that. For example, if you uploaded /data/uom/neil/mydata
from the instrument, you will get /projects/proj-neil-1128.4.1000/DM-uploads/<transactional folder>/mydata
You can see that only the child part ("mydata
") of the uploaded folder is preserved in Mediaflux. It is not uncommon though for instrument operators to want to preserve some of that structure (often it reflects organisations, departments and users). This is possible with the Data Mover
via a special mapping capability that moves the data somewhere else after it is first uploaded.
With this approach, it's possible to maintain any part of the source folder structure. For example, we can arrange so that the upload could go to any of
/projects/proj-neil-1128.4.1000/DM-uploads/neil/<transactional folder>/mydata
/projects/proj-neil-1128.4.1000/DM-uploads/uom/neil/<transactional folder>/mydata
/projects/proj-neil-1128.4.1000/DM-uploads/data/uom/neil/<transactional folder>/mydata
If you interested in this capability, please contact the Data Solutions team.
Storing Meta-data in the Download
The standard process delivers to the user, just the data that was uploaded. It is also possible now to configure your Instrument upload shareable so that when the download links are made, the download will include, along with the data, a file called _metadata.xml.
This file contains meta-data such as the keywords that were set during the upload (e.g. via the Data Mover
GUI).
If you interested in this capability, please contact the Data Solutions team.