Getting started with Mediaflux
How to access Mediaflux
You will not be able to access Mediaflux until resources are provisioned for you (and your account is enabled to access those resources - UoM researchers can see this page: https://gateway.research.unimelb.edu.au/platforms-data-and-reporting/data-and-computation/research-computing-services-rcs/our-services/data-storage-and-management)
Data stored on Mediaflux can be accessed in many ways. We recommend the Mediaflux Explorer as the best starting place.
Ways in which you can access Mediaflux include:
Through dedicated clients with strong data integrity protections (through the HTTPS protocol), such as:
- Mediaflux Explorer (a Java-based GUI application, recommended as the best starting place for working with your data), or
- The unimelb command-line clientsUsing a third-party sftp client
Through the web-based (Mediaflux Desktop)
By mounting as a network drive (within the University network or via the University VPN) using the SMB protocol
By using the rsync command line tool for syncing data
The Mediaflux server name is mediaflux.researchsoftware.unimelb.edu.au. Additional information is available on the list of all access methods supported by the University. In here you will find further information such as the port for each protocol.
Multi-Factor Authentication (MFA) for Mediaflux is now available. To learn more and enrol, visit: Mediaflux MFA
How to work with your Project Data
Because Mediaflux is a very diverse system, it is not possible to describe everything you can do with it. Instead, we focus on some basic tasks that are relevant to managing research data. What you can do depends a little on the interface you are using. So our approach will be to describe scenarios, and then discuss which tools are most relevant to it.
Processes to do with accounts, provisioning storage, and to set which users access each project are currently handled by the Research Computing Services data support team.
Feel free to contact us for more information.
Spartan HPC Service Users at the University of Melbourne
Some of the data movement tools described in the upload/download links above are conveniently provided to Spartan users via the Spartan module system.
Software Downloads
Please see this link for downloads including various Mediaflux client applications.
Managing user access to your project
Please see this link for info on giving others access to your mediaflux project data
How does Mediaflux protect your research data?
Mediaflux is a database-backed asset management system. At the University of Melbourne, it is primarily used as a file store for research data. When a file is stored in Mediaflux, it is stored as an asset, which consists of a database entry containing the file’s metadata, and an associated content object containing the file’s data.
Files held in Mediaflux are versioned. This means that if a file is updated, both the older version and the new version will be stored and can be retrieved if necessary.
In order to avoid data loss, Mediaflux is backed by two forms of replication: database replication and asset replication.
Database replication
Mediaflux is configured to replicate its database in real-time to a second server. That is, every transaction is committed to both the local database and a remote database copy. This configuration provides redundancy in the event of a local storage system failure.
Replication is the equivalent of a continuous database backup, however it will also replicate object deletions, so in addition to database replication, we back up the database to tape several times a day.
Asset content replication
When an asset is created or modified on the primary Mediaflux storage cluster, this asset/version pair is added to an asset processing queue. After a delay (currently 10 seconds), this asset version will be copied to the DR server.
Note that asset deletions are not currently replicated. If a file is deleted and then a new file with the same name is created, the DR will keep both files (renaming them to avoid name collisions).
This differs from a traditional backup which provide point-in-time snapshots across an entire dataset. Instead, we work at the asset level; you can think of this as a backup for each asset independently. This is mostly relevant when recovering entire directories, see Recovering Files.
Currently, content replicas are kept indefinitely, though this may change in the future.
Recovering files
If a file is overwritten by a new version, the older version will be restored on the primary server alone, as Mediaflux keeps all versions of a file.
If a file or directory is deleted, it will be recovered from the DR server by an administrator.
If an entire directory is deleted, it will be recovered from the DR server. Please note that:
Files that have been deleted on the primary cluster will still exist in the replica copy
Files that have been deleted and re-created will have multiple copies in the replica copy
Glossary
Production cluster
The Production Mediaflux cluster is the main system that users interact with. It consists of:
one controller node (containing the database)
one database replica node
two cluster nodes (which assist with moving data to/from users' machines)
Disaster Recovery (DR) cluster
The Disaster Recovery cluster receives asset content from the Primary Mediaflux cluster. It is accessible only to administrative staff. Requests can be made by users to recover asset content from the DR cluster.
Asset processing queue
An asset processing queue is a FIFO data structure that can be used to perform operations on a list of asset id/version pairs. For asset replication, any asset version that is added to the queue is copied to the DR server.