Getting started with Mediaflux

Getting started with Mediaflux

 

How to access Mediaflux

You will not be able to access Mediaflux until resources are provisioned for you (and your account is enabled to access those resources - UoM researchers can see this page: https://gateway.research.unimelb.edu.au/platforms-data-and-reporting/data-and-computation/research-computing-services-rcs/our-services/data-storage-and-management)

Data stored on Mediaflux can be accessed in many ways. We recommend the Mediaflux Explorer as the best starting place.

Ways in which you can access Mediaflux include:

The Mediaflux server name is mediaflux.researchsoftware.unimelb.edu.au.  Additional information is available on the list of all access methods supported by the University. In here you will find further information such as the port for each protocol.

Multi-Factor Authentication (MFA) for Mediaflux is now available. To learn more and enrol, visit: Mediaflux MFA

How to work with your Project Data

Because Mediaflux is a very diverse system, it is not possible to describe everything you can do with it. Instead, we focus on some basic tasks that are relevant to managing research data. What you can do depends a little on the interface you are using. So our approach will be to describe scenarios, and then discuss which tools are most relevant to it.

Spartan HPC Service Users at the University of Melbourne

Some of the data movement tools described in the upload/download links above are conveniently provided to Spartan users via the Spartan module system.

Software Downloads

Please see this link for downloads including various Mediaflux client applications.

Managing user access to your project

Please see this link for info on giving others access to your mediaflux project data


How does Mediaflux protect your research data?

Mediaflux is a database-backed asset management system. At the University of Melbourne, it is primarily used as a file store for research data. When a file is stored in Mediaflux, it is stored as an asset, which consists of a database entry containing the file’s metadata, and an associated content object containing the file’s data.

Files held in Mediaflux are versioned. This means that if a file is updated, both the older version and the new version will be stored and can be retrieved if necessary.

In order to avoid data loss, Mediaflux is backed by two forms of replication: database replication and asset replication.

Database replication

Mediaflux is configured to replicate its database in real-time to a second server. That is, every transaction is committed to both the local database and a remote database copy. This configuration provides redundancy in the event of a local storage system failure.

Replication is the equivalent of a continuous database backup, however it will also replicate object deletions, so in addition to database replication, we back up the database to tape several times a day.

Asset content replication

When an asset is created or modified on the primary Mediaflux storage cluster, this asset/version pair is added to an asset processing queue. After a delay (currently 10 seconds), this asset version will be copied to the DR server.

Note that asset deletions are not currently replicated. If a file is deleted and then a new file with the same name is created, the DR will keep both files (renaming them to avoid name collisions).

This differs from a traditional backup which provide point-in-time snapshots across an entire dataset. Instead, we work at the asset level; you can think of this as a backup for each asset independently. This is mostly relevant when recovering entire directories, see Recovering Files.

Currently, content replicas are kept indefinitely, though this may change in the future.

Recovering files

If a file is overwritten by a new version, the older version will be restored on the primary server alone, as Mediaflux keeps all versions of a file.

If a file or directory is deleted, it will be recovered from the DR server by an administrator.

If an entire directory is deleted, it will be recovered from the DR server. Please note that:

  • Files that have been deleted on the primary cluster will still exist in the replica copy

  • Files that have been deleted and re-created will have multiple copies in the replica copy

Glossary

Production cluster

The Production Mediaflux cluster is the main system that users interact with. It consists of:

  • one controller node (containing the database)

  • one database replica node

  • two cluster nodes (which assist with moving data to/from users' machines)

Disaster Recovery (DR) cluster

The Disaster Recovery cluster receives asset content from the Primary Mediaflux cluster. It is accessible only to administrative staff. Requests can be made by users to recover asset content from the DR cluster.

Asset processing queue

An asset processing queue is a FIFO data structure that can be used to perform operations on a list of asset id/version pairs. For asset replication, any asset version that is added to the queue is copied to the DR server.