Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Introduction

Please see the Terminology section first.

...

Excerpt

The Mediaflux Data Mover tool is used to move data to and from Mediaflux.   It has two primary capabilities: 

  1. download from mediaflux - via a  download link which can also be emailed to someone to use.  

  2. upload to mediaflux - via an upload link which can also be emailed to someone to use.

These two basic capabilities are further combined at the University of Melbourne into two additional specialised capabilities:


The Data Mover also has the following attributes:

  • Secure - it utilises the HTTPS protocol so that data are encrypted in transport.  
  • Efficient - it moves data in parallel to maximise throughput
  • Robust - it has built in integrity checking ensuring source and destination data are identical including the number of files uploaded. 
  • Restartable - it can restart where it left off if a transfer fails for some reason (e.g. a network error).
  • Automatic - it will self-install and self update as needed - the Data Mover tool is actually a fully self-contained Java application.

The Data Mover is formally supported for

  • macOS
  • Windows 10
    • It will likely work on other older Windows platforms
    • Windows 7 has been demonstrated to work
  • Linux 
    • It is provided as a Java bundle (Java included)  - no particular installer, so it is very portable to most flavours of Linux so long as OpenJDK 17 is supported (see below)

The Data Mover tool (client application) is  free to use for all users.   The user does not need a Mediaflux system to use it - it's just like the Zoom video conferencing tool (you don't need a Zoom system to use the Zoom client tool).

...

Anchor
DataMoverInstallation
DataMoverInstallation

...

Installing and Configuring the Data Mover

You can fetch the Data Mover manually and install it, or you can wait until you receive a Shareable (for upload or download). Currently for Linux it is only possible to install manually. Clicking on the Shareable link will initiate the download and installationdownload and installation.

The Data Mover is formally supported for macOS, Windows 10 and Linux.  Data Mover is built using Java, so you can see the system requirements for OpenJDK 17.0.2 for a list of supported platforms.  In addition to those listed, we know that Windows 7 does currently work.

...

  1. Download the Data Mover manually and Install 
    1. Preparation
      1. When doing a new install, remove any .jar files that you may have downloaded (part of the auto-update process see section 3) in your .Arcitecta/DataMover/updates folder (under your home directory).
      2. If you don't remove these, they will conflict with the newly installed version.
      3. Uninstall the old version of the app (or just install over the top of it).
      4. Do not rename the old application to something else as this will cause conflicts. If you want to keep an earlier version, then zip up the old application first.
    2. Download 
      1. The URL is https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/index.html
      2. From there you can download the version for your operating system (OSX, Windows 10 and Linux)
      3. If you'd like to download a specific version with a tool like curl or wget the URLs are:
        1. macOS : https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/installers/mac/Mediaflux%20Data%20Mover.dmg
        2. Windows : https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/installers/windows/Mediaflux%20Data%20Mover.msi
        3. Linux : https://mediaflux.researchsoftware.unimelb.edu.au/mflux/data/mover/installers/linux/mediaflux-data-mover.zip
    3. Install
      1. macOS
        1. Double click on the file Mediaflux Data Mover.dmg file to install
          1. If you encounter an issue where macOS complains that the .dmg file is damaged, you can resolve that by removing the extended attributes.  Start the Terminal (Command line) Application, change directory to the directory where you downloaded the .dmg (in the example it's the Downloads directory) and issue a command like this

            Code Block
            cd ~/Downloads
            xattr -cr Mediaflux\ Explorer-1.5.0.dmg


        2. Using the GUI that is presented, drag the Data Mover to /Applications
      2. Windows 
        1. Double click on the file Mediaflux Data Mover.msi to install.
      3. Linux
        1. Unpack the mediaflux-data-mover.zip file.  Note that the resulting directory mediaflux-data-mover must not be stored in a directory that contains bin as one of its elements due to a Linux Java bug (avoid /usr/local/bin or ~/bin for example).
        2. Set the binary to be executable with a command like:

          Code Block
          chmod +x mediaflux-data-mover/bin/mediaflux-data-mover


        3. Optionally, add  <path>/mediaflux-data-mover/bin to your PATH variable
        4. Optionally, read the README.txt for instructions on setting up your web browser to automatically open arcio links with the Data Mover\
      4. Check
        1. After you install, start up the Data Mover and make sure the version running (see bottom left of GUI) is the version you expect.
  2. Use a Shareable link to download and install
    1. Paste a Shareable (upload or download) into a browser
    2. It will be detected whether you have the Data Mover already or not. If not, it will download the correct operating system installer (Windows 7/10, MacOS, Linux)
    3. Execute the installer (for whatever platform you are on as in Section 1 above)
  3. An optional XML configuration file called settings.xml can be created in the  .Arcitecta/DataMover (beneath the home directory) folder. This file controls varies Data Mover behaviour. Details are found on this page.

...

Anchor
DataMoverUpdating
DataMoverUpdating

...

Updating the Data Mover via the GUI

You will download the initial install of the DataMover from our Mediaflux server (see section 2 above).  However, thereafter, the Data Mover tool GUI will offer you updates when they are available (from Arcitecta, the vendor).  When you start it, there will be a prompt in the bottom right of the main screen where you can update and relaunch the Data Mover.  

...

Anchor
DataMoverStarting
DataMoverStarting

...

Starting the DataMover GUI / Consuming Upload and Download Shareables

There are two ways to use a Shareable with the Data Mover.

  1. When you click on a Shareable URL (e.g. received by email), it will automatically start the Data Mover if it's installed, or take you through the process to install it if not. 
  2. You can also start the Data Mover manually
    1. The Data Mover is just an application so start it how you would start any other application on your platform (e.g. double click or select from menu).
    2. Click 'Add New'
    3. Paste in the Shareable (copy it from wherever you got it, usually an email)


The  Data Mover will know what kind (upload or download) of Shareable it is, and you will then be presented with a GUI for uploading or downloading. In that GUI you can select the source (for Upload) or destination (for Download) for the data, and click Upload or Download to activate the data transfer task.   After the task is completed, it moves to the Completed section where you can download the activity log.

...

Figure 2b (middle) : After clicking Download  Download  the download is progressing

...

Figure 3c (right): After the upload  completes, the secondary GUI disappears and Completed has been selected on the main GUI.After DataMover tasks are completed, there is a log

Watch Folders

Image AddedImage Added

Data Mover has the ability to watch a folder for data as they are created.  This enables you to, for example, upload files as they are created by an instrument.  To enable this feature, check the Enable Watching checkbox when selecting the location to upload.  If you wish to start with an empty folder, also check the Allow empty uploads checkbox. Data Mover will follow the following process:

  • Every 5 seconds, check the folder for the presence of new files
  • If a new file is detected, wait 60 seconds and check the file again.  If the file hasn't changed in that time, queue it for upload

You will see the progress bar will oscillate indicating that it is waiting for files to be created.  Watching will continue until it is disabled or the Data Mover is closed.  To disable watching and complete the upload, click the "eye" (Image Added)  icon next to the indicator of files and bytes uploaded.

By default, symlinks in the source upload will be uploaded as symlinks to the Mediaflux system.  If you would like the file referenced by the symlink to be uploaded instead, effectively "dereferencing" the symlinks, you can select the Follow Symlinks checkbox when selecting the location to upload.

Log file

After DataMover tasks are completed, there is a log file of the task available for download (click on the Completed tab, and then for the task of interest, click on the CSV icon (second from right next to Bin).   The log records the transaction details of all files that should have been uploaded or downloaded.  In the downloaded CSV file is a column called state.  It may have one of the following values:

...

Anchor
DataMoverPreExisting
DataMoverPreExisting

...

Behaviour when data pre-exist

When the data pre-exist (upload or download) it is important for you to know how the DataMover handles this.

...

Download

You are presented with multiple choices (there are also tool tips on these selections that you can review) to direct what the DataMover behaviour is if data pre-exist on download.

  • Rename - if the destination directory already exists, creates a new copy of the directory as directoryname.1 (or directoryname.2 etc.).
  • Update - inspects any files already in the destination directory and if changed (detected using file path, size and checksum), overwrites with the version on the server.  If you have made any changes on your client machine they will be overwritten.
  • Overwrite - if a file to be downloaded already exists in the destination directory (detected using file path only), always overwrites with the version on the server.
  • Skip - if a file to be downloaded already exists in the destination directory (detected using file path only), it will be skipped.
  • Fail - if a file to be downloaded already exists in the destination directory (detected using file path only), that file will be noted as failed.

...

Upload

Upload is a little more complex. This is because the DataMover Data Mover can recover from failures (such as a network failure) which may leave a partially uploaded file fragment in the asset in Mediaflux.

  •  If the last version of the target asset (determined by path) is not a partial fragment (determined by path, size and checksum if needed) of the source file being uploaded, you get a new asset version transmitting the source file from the beginning.  This is the use case that the source file has changed (but has the same path).
  • If the the target asset is a partial file fragment (determined by path, size and checksum if needed) of the source file being uploaded, you get a new asset version copying from the previous version (that was previously uploaded) and then transmitting the rest of the source file. This is the use case that the source file has not changed, but a previous upload failed and is being restarted.
  • If the last version of the target asset (determined by path) is the same (determined by size and checksum) as the source the upload is skipped. This is the use case of uploading the same file twice.

...

Utilising the Command-Line Interface of the Data Mover

Not all environments offer a graphical (windowing) environment.  This is most common in Unix high-performance computing environments although slowly becoming a thing of the past.  For this reason, the Data Mover also has a Command-Line Interface (CLI) as described in this section.

...

Anchor
DataMoverShareables
DataMoverShareables

...

Creating and Working With  Upload and Download Shareables

Shareables can be  created by Mediaflux Users with the Mediaflux Explorer  (V 1.5.1 and later) and are consumed by the Data Mover.  Download shareables allow the consumer to recursively download data from a namespace (folder) when they otherwise have no access.  Upload shareables allow the consumer to upload data to a specified Mediaflux namespace (folder) when they otherwise have no access. The consumer has no visibility on the data as it arrives in Mediaflux - it is an opaque or anonymous upload process.

...

Download

Include Page
KB:Sharing Big Data with Download ShareKB:
Sharing Big Data with Download Share

...

Upload

Include Page
KB:Collecting Data with Upload ShareKB:
Collecting Data with Upload Share