Data Mover Reference

Introduction

Please see the Terminology section first.


The Mediaflux Data Mover tool is used to move data to and from Mediaflux.   It has two primary capabilities: 

  1. download from mediaflux - via a  download link which can also be emailed to someone to use.  

  2. upload to mediaflux - via an upload link which can also be emailed to someone to use.

These two basic capabilities are further combined at the University of Melbourne into two additional specialised capabilities:


The Data Mover also has the following attributes:

  • Secure - it utilises the HTTPS protocol so that data are encrypted in transport.  
  • Efficient - it moves data in parallel to maximise throughput
  • Robust - it has built in integrity checking ensuring source and destination data are identical including the number of files uploaded. 
  • Restartable - it can restart where it left off if a transfer fails for some reason (e.g. a network error).
  • Automatic - it will self-install and self update as needed - the Data Mover tool is actually a fully self-contained Java application.

The Data Mover is formally supported for

  • macOS
  • Windows 10
    • It will likely work on other older Windows platforms
    • Windows 7 has been demonstrated to work
  • Linux 
    • It is provided as a Java bundle (Java included)  - no particular installer, so it is very portable to most flavours of Linux so long as OpenJDK 17 is supported (see below)

The Data Mover tool (client application) is  free to use for all users. The user does not need a Mediaflux system to use it - it's just like the Zoom video conferencing tool (you don't need a Zoom system to use the Zoom client tool).



Installing and Configuring the Data Mover

You can fetch the Data Mover manually and install it, or you can wait until you receive a Shareable (for upload or download). Currently for Linux it is only possible to install manually. Clicking on the Shareable link will initiate the download and installation.

The Data Mover is formally supported for macOS, Windows 10 and Linux.  Data Mover is built using Java, so you can see the system requirements for OpenJDK 17.0.2 for a list of supported platforms.  In addition to those listed, we know that Windows 7 does currently work.


  1. Download the Data Mover manually and Install 
    1. Preparation
      1. When doing a new install, remove any .jar files that you may have downloaded (part of the auto-update process see section 3) in your .Arcitecta/DataMover/updates folder (under your home directory).
      2. If you don't remove these, they will conflict with the newly installed version.
      3. Uninstall the old version of the app (or just install over the top of it).
      4. Do not rename the old application to something else as this will cause conflicts. If you want to keep an earlier version, then zip up the old application first.
    2. Download 
      1. The URL is https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/index.html
      2. From there you can download the version for your operating system (OSX, Windows 10 and Linux)
      3. If you'd like to download a specific version with a tool like curl or wget the URLs are:
        1. macOS : https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/installers/mac/Mediaflux%20Data%20Mover.dmg
        2. Windows : https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/installers/windows/Mediaflux%20Data%20Mover.msi
        3. Linux : https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/installers/linux/mediaflux-data-mover.zip
    3. Install
      1. macOS
        1. Double click on the file Mediaflux Data Mover.dmg file to install
          1. If you encounter an issue where macOS complains that the .dmg file is damaged, you can resolve that by removing the extended attributes.  Start the Terminal (Command line) Application, change directory to the directory where you downloaded the .dmg (in the example it's the Downloads directory) and issue a command like this

            cd ~/Downloads
            xattr -cr Mediaflux\ Explorer-1.5.0.dmg
        2. Using the GUI that is presented, drag the Data Mover to /Applications
      2. Windows 
        1. Double click on the file Mediaflux Data Mover.msi to install.
      3. Linux
        1. Unpack the mediaflux-data-mover.zip file.  Note that the resulting directory mediaflux-data-mover must not be stored in a directory that contains bin as one of its elements due to a Linux Java bug (avoid /usr/local/bin or ~/bin for example).
        2. Set the binary to be executable with a command like:

          chmod +x mediaflux-data-mover/bin/mediaflux-data-mover
        3. Optionally, add  <path>/mediaflux-data-mover/bin to your PATH variable
        4. Optionally, read the README.txt for instructions on setting up your web browser to automatically open arcio links with the Data Mover\
      4. Check
        1. After you install, start up the Data Mover and make sure the version running (see bottom left of GUI) is the version you expect.
  2. Use a Shareable link to download and install
    1. Paste a Shareable (upload or download) into a browser
    2. It will be detected whether you have the Data Mover already or not. If not, it will download the correct operating system installer (Windows 7/10, MacOS, Linux)
    3. Execute the installer (for whatever platform you are on as in Section 1 above)
  3. An optional XML configuration file called settings.xml can be created in the  .Arcitecta/DataMover (beneath the home directory) folder. This file controls varies Data Mover behaviour. Details are found on this page.



Updating the Data Mover via the GUI

You will download the initial install of the DataMover from our Mediaflux server (see section 2 above).  However, thereafter, the Data Mover tool GUI will offer you updates when they are available (from Arcitecta, the vendor).  When you start it, there will be a prompt in the bottom right of the main screen where you can update and relaunch the Data Mover.  

Please note that Data Mover configuration file (settings.xml)is not affected by update processes.

In general, the update process is  light-weight - this means that all the update does is download a new Java .jar file and locate it in the .Arcitecta/DataMover/updates folder (under your home directory).  However, occasionally, the full application needs to be updated in a heavy-weight process.  This accommodates the occasional need for the application to be repackaged (e.g. a new version of Java). This multi-step heavy-weight process is really targeted at external end-users who may have installed the Data Mover to receive data from a University of Melbourne instrument (we don't know who they are so we cannot write to them).  

Otherwise, it is easier to

  • uninstall the Data Mover and remove any files in the .Arcitecta/DataMover/updates folder
  • download and install the current release from our server (see section 2).

If you do follow the heavy-weight update process, for whatever reason, and this will happen between versions 1.0.11 (the initial deployment) and 1.1.15 (the next release), then carefully  follow the on-screen instructions.  This is the outline of the steps to go from v 1.0.11 to v 1.1.15

  • Light-weight update to v 1.0.13
  • Manually quit and restart the Data Mover (the relaunch button does not work in this release) v 1.0.13
  • Heavy-weight update to v 1.1.10
    • download the full package
    • quits Data Mover
    • prompts you to install it 
  • Manually start the newly installed v 1.1.10
  • Light-weight update to v 1.1.13
  • Relaunch (now running v 1.1.13)
  • Light-weight update to v 1.1.15
  • Relaunch v 1.1.15
  • Check the right version is running


Starting the DataMover GUI / Consuming Upload and Download Shareables

There are two ways to use a Shareable with the Data Mover.

  1. When you click on a Shareable URL (e.g. received by email), it will automatically start the Data Mover if it's installed, or take you through the process to install it if not. 
  2. You can also start the Data Mover manually
    1. The Data Mover is just an application so start it how you would start any other application on your platform (e.g. double click or select from menu).
    2. Click 'Add New'
    3. Paste in the Shareable (copy it from wherever you got it, usually an email)


The  Data Mover will know what kind (upload or download) of Shareable it is, and you will then be presented with a GUI for uploading or downloading. In that GUI you can select the source (for Upload) or destination (for Download) for the data, and click Upload or Download to activate the data transfer task.   After the task is completed, it moves to the Completed section where you can download the activity log.

If you are downloading, the Download button will not be active if the download is larger than the size of the file system for your default download path.  You can change the default download path in the settings.xml file (see above).

The following images show some representative examples of how the GUIs look.  In this example,  the Shareables have been directly pasted into the GUI. If you had just clicked on the Shareable in an email, the process would enter at the second row of images (Figure 2 for Download and Figure 3 for upload)

Download and Upload


Figure 1a (left): The Data Mover GUI ready after starting it.                 

Figure 1b (right): After clicking 'Add new' the secondary GUI is seen

Download

Figure 2a (left): After pasting in the Download Shareable into the secondary GUI and selecting the destination parent folder. Because the Rename option was selected, and the destination folder already pre-exists, the output has been renamed automatically with a ".1" appended.

Figure 2b (middle) : After clicking Download  the download is progressing

Figure 2c (right): After the download completes, the secondary GUI disappears and Completed has been selected on the main GUI.

Upload

Figure 3a (left): After pasting in the Upload Shareable into the secondary GUI and selecting the source folder.

Figure 3b (middle) : After clicking Upload  the upload is progressing

Figure 3c (right): After the upload  completes, the secondary GUI disappears and Completed has been selected on the main GUI.

Watch Folders

Data Mover has the ability to watch a folder for data as they are created.  This enables you to, for example, upload files as they are created by an instrument.  To enable this feature, check the Enable Watching checkbox when selecting the location to upload.  If you wish to start with an empty folder, also check the Allow empty uploads checkbox. Data Mover will follow the following process:

  • Every 5 seconds, check the folder for the presence of new files
  • If a new file is detected, wait 60 seconds and check the file again.  If the file hasn't changed in that time, queue it for upload

You will see the progress bar will oscillate indicating that it is waiting for files to be created.  Watching will continue until it is disabled or the Data Mover is closed.  To disable watching and complete the upload, click the "eye" ()  icon next to the indicator of files and bytes uploaded.

By default, symlinks in the source upload will be uploaded as symlinks to the Mediaflux system.  If you would like the file referenced by the symlink to be uploaded instead, effectively "dereferencing" the symlinks, you can select the Follow Symlinks checkbox when selecting the location to upload.

Log file

After DataMover tasks are completed, there is a log file of the task available for download (click on the Completed tab, and then for the task of interest, click on the CSV icon (second from right next to Bin).   The log records the transaction details of all files that should have been uploaded or downloaded.  In the downloaded CSV file is a column called state.  It may have one of the following values:

  • download
    • in-progress
    • complete
    • skipped
    • failed
    • content-missing
    • asset-not-found
    • no-content
    • unknown
  • upload
    • in-progress
    • complete
    • skipped
    • failed
    • unchanged
    • unknown


Behaviour when data pre-exist

When the data pre-exist (upload or download) it is important for you to know how the DataMover handles this.

Download

You are presented with multiple choices (there are also tool tips on these selections that you can review) to direct what the DataMover behaviour is if data pre-exist on download.

  • Rename - if the destination directory already exists, creates a new copy of the directory as directoryname.1 (or directoryname.2 etc.).
  • Update - inspects any files already in the destination directory and if changed (detected using file path, size and checksum), overwrites with the version on the server.  If you have made any changes on your client machine they will be overwritten.
  • Overwrite - if a file to be downloaded already exists in the destination directory (detected using file path only), always overwrites with the version on the server.
  • Skip - if a file to be downloaded already exists in the destination directory (detected using file path only), it will be skipped.
  • Fail - if a file to be downloaded already exists in the destination directory (detected using file path only), that file will be noted as failed.

Upload

Upload is a little more complex. This is because the Data Mover can recover from failures (such as a network failure) which may leave a partially uploaded file fragment in the asset in Mediaflux.

  •  If the last version of the target asset (determined by path) is not a partial fragment (determined by path, size and checksum if needed) of the source file being uploaded, you get a new asset version transmitting the source file from the beginning.  This is the use case that the source file has changed (but has the same path).
  • If the target asset is a partial file fragment (determined by path, size and checksum if needed) of the source file being uploaded, you get a new asset version copying from the previous version (that was previously uploaded) and then transmitting the rest of the source file. This is the use case that the source file has not changed, but a previous upload failed and is being restarted.
  • If the last version of the target asset (determined by path) is the same (determined by size and checksum) as the source the upload is skipped. This is the use case of uploading the same file twice.

Utilising the Command-Line Interface of the Data Mover

Not all environments offer a graphical (windowing) environment.  This is most common in Unix high-performance computing environments although slowly becoming a thing of the past.  For this reason, the Data Mover also has a Command-Line Interface (CLI) as described in this section.


Creating and Working With  Upload and Download Shareables

Shareables can be  created by Mediaflux Users with the Mediaflux Explorer  (V 1.5.1 and later) and are consumed by the Data Mover.  Download shareables allow the consumer to recursively download data from a namespace (folder) when they otherwise have no access.  Upload shareables allow the consumer to upload data to a specified Mediaflux namespace (folder) when they otherwise have no access. The consumer has no visibility on the data as it arrives in Mediaflux - it is an opaque or anonymous upload process.

Download

Download Share (Link) can be created using Mediaflux Explorer to share your data (especially big data sets) with external collaborators. Download Share allow the consumer to recursively download data from a namespace (folder) when they otherwise have no access.


Note:

  • Creating download share is only available in Medaflux Explorer v1.5.1 or later versions. Please make sure you have the latest Mediaflux Explorer installed.
  • The assets available for download from a Download Share are pre-computed into a manifest at the time the Download Share is created. This means, for example, that if you created a Download Share for a specific Mediaflux namespace (folder), and then subsequently added some new assets, the person consuming the Download Share would not get the new assets.


To create a download share with the Mediaflux Explorer, simply right click on a namespace (folder) in the navigator pane (on the left) and select Create Download Share.

  • A new GUI will pop up - simply fill in the details (only the Name is mandatory).  
  • We generally recommend you set an Valid To field for the share to expire at.
  • If you want to email the download share link to someone (possibly yourself) enter the address in the Invitation Email entry box.
  • If you enter a Password, you will need to communicate that yourself to the recipient of the download share.  If you set a complex password, then it is suggested that you copy/paste it in from some other source. For example, if you are going to send the password to someone in a separate email, enter it into that email, then copy and paste it into the GUI (because you can neither see it nor copy it from the GUI once it is set).
  • After you click the Create Download Share button, the GUI will be replaced by another with the opportunity to copy the download share link to the Clipboard (so that you can paste it somewhere).
  • All the recipient of the download share (link) needs to do is click on it in an email or paste it into their browser. This will start the Mediaflux Data Mover app (download and install or just execute if already installed) and the data in and under the selected namespace (folder) will be downloaded.

Upload

Upload Share can be created using Mediaflux Explorer for external collaborators to upload data to your Mediaflux project.

Note:

Creating upload share is only available in Medaflux Explorer v1.5.1 or later versions. Please make sure you have the latest Mediaflux Explorer installed.


To create an upload shareable with the Mediaflux Explorer, simply right click on a namespace (folder) in the navigator pane (on the left) and select Create Upload Share.

  • A new GUI will pop up - simply fill in the details (only the Name is mandatory).  
  • We generally recommend you set an Valid To field for the share to expire at.
  • If you want to email the upload share link to someone (possibly yourself) enter the address in the Invitation Email entry box.
  • If you enter a Password, you will need to communicate that yourself to the recipient of the shareable.   If you set a complex password, then it is suggested that you copy/paste it in from some other source. For example, if you are going to send the password to someone in a separate email, enter it into that email, then copy and paste it into the GUI (because you can neither see it nor copy it from the GUI once it is set).
  • There is an option to limit the amount of data that can be uploaded into your project using this share.
  • There is an option called 'Add Unique Collection'.  If you check this, then when the data are uploaded, a parent namespace (folder) with a unique name is created and the data are then created beneath it.  Otherwise, the data the consumer uploads go directly into the specified namespace (folder).
  • After you click the Create Upload Share button, the GUI will be replaced by another with the opportunity to copy the upload share link to the Clipboard (so that you can paste it somewhere).