This is a command-line Java application that you can use to efficiently upload your data into Mediaflux and make integrity checks. Installation instructions are available in the parent page
This client can
- upload files in parallel (--nb-workers). There is no magic in this, it will only go faster if there is sufficient network capacity. Therefore, please don't use more than 4 upload threads. You may even find that if the network is heavily congested, 4 threads is no faster than 1. You may have to experiment a little to find the optimum.
- compute checksums for additional validation (see below)
- write a log file of the upload
- generate and email a summary of the upload (including successful and failed uploads, and the number of zero-sized files it encountered)
- run in daemon mode (in the background) so it keeps uploading new data to Mediaflux as it arrives in your local file system
- Please see all command-line arguments with the --help switch
Here are all the details for the command-line arguments to this client.
Examples
You will need to know where (the path) to locate your data in Mediaflux (the --dst-namespace argument of the command) and where to upload from (the last positional argument)
Example 1
Upload data with four worker threads and turn on checksums for upload integrity checking (recommended). As the location of the config files is not specified, the client will look for it in the .Arcitecta directory of your home directory.
unimelb-mf-upload --csum-check --nb-workers 4 --namespace /projects/proj-myproject-1128.1.59/12Jan2018 /data/projects/punim0058
Example 2
Upload data with one worker thread and specify explicitly where the configuration file is.
unimelb-mf-upload --mf.config /Users/nebk/.Arcitecta/mflux.cfg --namespace /projects/proj-myproject-1128.1.59/12Jan2018 /data/projects/punim0058
Checksums
Checksums (a unique number computed from the contents of a file) are an important data integrity mechanism. The Mediaflux server computes a checksum for each file it receives. The upload client can compute checksums from the source data on the client side and compare with the checksum computed by the server when it receives the file. If the checksums match, we can be very confident that the file uploaded correctly. Many other clients for other protocols (e.g. sFTP and SMB) do not do this.
By default, checksums are not enabled (because computing checksums slows down the upload process). However, it is strongly recommended that you enable these during the upload or run the checker client unimelb-mf-check with checksums to check the upload afterwards.
Case 1 - Files DO NOT pre-exist on Mediaflux
When you enable checksums, and the data DO NOT already exist on the server, the client will compute the check sum as part of the upload process. When Mediaflux creates the asset, it will also compute the checksum. These checksums will be compared.
Case 2 - Files DO pre-exist on Mediaflux
When you enable checksums, and the data DO already exist on the server (by path/name and size), then client will compute the check sum on the local file first and compare the checksum with that already stored in Mediaflux.
If the checksums differ, it will then proceed to re-upload the local file (following the process in Case 1. above) because it has changed and make a new asset version. Thus, overall 2 checksums are computed by the client and one by the server.
Pre-existing files
The client checks whether files already exist in Mediaflux or not. If they do exist it will skip the upload. The checks it uses are:
- File path/name exists and is the same
- File size is the same
- If checksums are enabled, the checksum is the same
If any of these fail, the file does not pre-exist and will be re-uploaded. In the case that the path/name is the same, but the source file has changed content, it will be uploaded to the pre-existing asset in Mediaflux as a new version.