PetaLibrary S3 Glacier¶
Documentation for the Globus Online transfer of Petalibrary data to S3 Bucket.
Configuration¶
Discussion with Research Computing (Jason Armbruster) to set up Globus Online endpoint (cubl-petalibrary-archive).
RC set up a globus endpoint “S3 prototype CU Boulder Libraries”
When you open that collection you’re going to get prompted to authenticate with Boulder Identikey first, then once that’s successful, you’ll have to also authenticate with an AWS key/secret pair for a user who has access to the S3 bucket.
User needs to have UC Boulder Research Computing Account with access to “dulockgrp” group. The group will give access to “libdigicoll”
Path /pl/archive/libdigicoll/ to access UC Boulder Library data
Trial S3 Data Transfer¶
CTA (Vida) is currently awaiting UC Boulder Research Computing account with access to Library data.
Data Trial: Contact Michael Dulock
/pl/archive/libdigicoll/libimage-bulkMove/ (2TB) /pl/archive/libdigicoll/libstore-bulkMove/ (4.4TB) /pl/archive/libdigicoll/libberet-bulkMove/RFS/DigitalImages/ (Important) /pl/archive/libdigicoll/libberet-bulkMove/RFS/ (23TB)
TODO¶
If trial successful add new globus endpoints
The CU Scholar archive is currently being manually moved. This would cut out the middle step and provide direct access from S3 to PetaLibrary.
Transfer S3 bucket(cubl-ir-fcrepo) ==> /pl/archive/libdigicoll/dataSets/cu_scholar/cubl-ir-fcrepo
The above actions will allow for 3 copies with one copy in a different geolocation.
This is part of the Core Trust Seal actions needed for CU Scholar.
AWS Lambda to move IR files on demand
Manual backup to Petalibrary¶
Sync S3 Bucket to local drive
cd { data download directory } aws s3 sync s3://cubl-ir-fcrepo .
Install Globus Connect Personal
Create Endpoint on System
Use the Web Interface to start a transfer from Laptop Endpoint to Petalibrary
Laptop endpoint where the AWS sync happened
Petalibrary Endpoint
/pl/archive/libdigicoll/dataSets/cu_scholar/
select
cubl-ir-fcrepo