Downloading

Downloading

Because Mediaflux supports a lot of protocols there are a lot of possible ways to access your data.

  1. Authenticated: Those who have their own accounts (local or via the Australian Access Federation) and can directly log in to Mediaflux.

  2. Non-authenticated: Those who don't have accounts - the user gains access via (usually temporary) secure tokens provisioned by somebody else. Of course, this mechanism can also work for users who have their own accounts as well - they just don't need those accounts in this instance.

Authenticated Users

This means you have an account and can log in directly to Mediaflux.  Generally speaking you can either use a dedicated Mediaflux client (which provide the best data integrity guarantees) or you can use a more generic protocol such as sFTP with a third-party client or SMB to mount as a network drive.

For downloading data, we recommend:

It is also possible to mount Mediaflux as a network drive, which is convenient for modifying data in-place.  We don't recommend this for bulk-downloads as it lacks the data-integrity guarantees that the HTTPS protocol provides.  See:

For a full list of available clients for Mediaflux, see the List of all access methods.

Non-Authenticated Users

The following mechanisms mean that you don't need to have an account and don't need to login directly to Mediaflux.  You will be provided with a sharable link which you can use to download the data that have been shared with you.

Sharable links all authenticate to Mediaflux with a secure identity token. The token usually expires after a fixed time (which you would be told by the person who provisions your data for you) and it is granted access to the data that it needs.

Direct Sharable Links

  • Direct shareable links download data to a container (e.g. a tar or zip file)

  • Direct links are not good for big data because

    • If the download fails (e.g. network interruption) you have to start again from the beginning

    • The downloaded container has to be extracted after download, requiring extra storage space

  • Direct shareable links can be provisioned by users via Mediaflux Explorer (beginner friendly) (see the Sharable links video on the Explorer how to videos page).

More information is available on the Direct shareable links page.

Indirect Sharable Links

  • An indirect shareable link downloads a downloader (e.g. a script or application) which itself downloads the data

  • These are best for big data because

    • They can be restarted

    • They may be able to download data in parallel

    • They don't pack the data into a container (so you don't need extra storage to extract)

  • Currently, indirect shareable links can only be provisioned by specific users via scripts provided via RCS

More information is available on the Indirect shareable links page.


Direct shareable links

General

Direct sharable links allow the sharing of data through a URL.

When you download data via a direct shareable link, it will download the data into a container (zip, tar or aar (Arcitecta archive)). After downloading the container you will have to unpack it.

Direct shareable links are a poor way to distribute big data because:

  • The data are packed in a container and so you need double the storage

  • If the process fails (very likely over a long time) you have to start over

  • We do not recommend direct shareable links for anything more than a few tens of GB as they are not robust.

Downloading

There are a variety of methods to download the data via the URL, but the simplest is to paste it into your browser and press return; this will activate a download process managed by the browser.

For more advanced users familiar with Unix command-line tools (which you might use in a script), such as curl or wget, here are some examples.

curl "https://mediaflux.researchsoftware.unimelb.edu.au/mflux/share.mfjp?_token=pkPfWMlGGoHZvroPbDNP112871145" -d browser=false -o DesktopTraining.zip
  • you must use the argument -d browser=false for this to function.

  • put the URL in double quotes as there may be characters in the URL that must be protected from the shell

wget -O DeskTopTraining.zip "https://mediaflux.vicnode.org.au/mflux/share.mfjp?_token=pkPfWMlGGoHZvroPbDNP112871145&browser=false"
  • If the embedded URL has a &filename attribute, it won't be used and you have to specify the -O <output file>.

Unpacking

You need to unpack the container to access the data

  • You can get the tool to unpack Arcitecta aar containers from our downloads page

  • Almost all operating systems support unpacking zip or tar files (either with double-click via graphical interfaces or command-line tools)

  • The container will be unpacked into a structure reflecting the original data.

 


Indirect shareable links

This means somebody else wants to share data with you and they have sent you a URL to paste in your browser (or use as an argument to Unix tools like wget or curl).  When you paste it into your browser and press return, this will activate a download process.  The difference with direct shareable links is that rather than directly downloading the data, it will download a zip file holding scripts/applications (perhaps a download manager) that itself will download the data for you.

With this approach, the download can be more efficient and robust.

  • Can be restartable

  • Can download in parallel in some cases

After download, unpack the container (it will be a standard zip file) to access the scripts.  We offer two types of data download scripts at present (and the person provisioning your link will have discussed with you which is more suitable). We also offer scripts for Unix and Windows (and both flavours will be in the downloaded zip file).

ATERM wrapper

  • These wrappers need Java 8 installed on the computer you are running them on (contact your local IT if you need help)

  • They utilise the ATERM download command

  • All you needs to do is execute the script (on Unix systems you can make it executable chmod +x <my script> or use the command source to execute it).

  • The script will then fetch the ATERM Jar file and use it to download your data - it will be held in a temporary directory that is deleted after it finishes.

  • By default the data go into the current working directory, but you can optionally specify the output directory as the first argument. The person that provisioned the shareable link will have decided how many parallel threads to use (typically at least 2 and not more than 4). You can change that by editing the script if you want.

  • If the download fails, e.g. your network drops out, then you can restart. The application will skip files it has already downloaded.

Shell Wrapper

  • In these scripts, each asset is downloaded from Mediaflux with one line of the script per asset.

  • They use pre-installed tools like curl and wget on Unix systems and powershell on Windows systems, so Java is not required.

  • These kinds of scripts cannot download data in parallel.

  • If the process fails, e.g. the your network drops out, then you can restart. The application will skip files it has already downloaded

Running downloads in the background

nohup

When launching scripts (be they shell or ATERM wrappers) that potentially need to run for long periods of time (for downloading a lot of data), we recommend that for Unix systems, you preface with the nohup command (ignores terminal hangups) and run in the background (then you can log out if you want). The log will be stored in a file called nohup.out

nohup myscript &

screen/tmux

Screen and tmux are utilities that allow you run interactive terminals within a wrapper that can be detached and reattached.  This allows you to start the download, detach the screen/tmux session, disconnect from the host, then come back at a later time, reconnect to the host, and reattach the interactive dowload session.  For more information see:

Research Computing Services will have assisted you to deploy a script, which, when executed, can generate an indirect shareable link and email it to a user. To use the script, you will need to access your computer's command-line interface.

Provisioning Indirect Shareable Links

If you have been provided with a script for generating Indirect Shareable Links, here is how to get started generating the links:

Windows

  • Start a Command Prompt terminal window by pressing the Start button and entering cmd

  • In the Command Prompt window, navigate to the directory where your script was installed, e.g.:

    cd C:\Mediaflux

    or

    cd C:\Users\nkilleen\Mediaflux
  • You can check you are in the right place with the command dir which will list the contents of that directory

  • Execute the script and supply the correct arguments (the name and value of the parameters that control what the script does). The script name may be specialised to you when we deploy it - it's the first part that changes from facility to something for you like mcfp or x617lab in the example)

  • To list all available arguments:

    facility-download-shell-script-url-create.cmd --help
  • To create a sharable link:

    facility-download-shell-script-url-create.cmd --expire-days <number of days before link expires>  --email <email address of link recipient> <namespace to create link for>
  • Example where facility has been named x617lab on deployment and the child namespace neils_data in the project proj-test-1128.2.3 is provisioned with a link:

    x617lab-download-shell-script-url-create.cmd --expire-days 30 --email nkilleen@unimelb.edu.au proj-test-1128.2.3/neils_data