PetaLibrary S3 Glacier

Documentation for the Globus Online transfer of Petalibrary data to S3 Bucket.

Configuration

  1. Discussion with Research Computing (Jason Armbruster) to set up Globus Online endpoint (cubl-petalibrary-archive).

  2. RC set up a globus endpoint “S3 prototype CU Boulder Libraries”

  3. When you open that collection you’re going to get prompted to authenticate with Boulder Identikey first, then once that’s successful, you’ll have to also authenticate with an AWS key/secret pair for a user who has access to the S3 bucket.

  4. User needs to have UC Boulder Research Computing Account with access to “dulockgrp” group. The group will give access to “libdigicoll”

  5. Path /pl/archive/libdigicoll/ to access UC Boulder Library data

Trial S3 Data Transfer

  1. CTA (Vida) is currently awaiting UC Boulder Research Computing account with access to Library data.

  2. Data Trial: Contact Michael Dulock

    /pl/archive/libdigicoll/libimage-bulkMove/
    (2TB)
    
    /pl/archive/libdigicoll/libstore-bulkMove/ 
    (4.4TB)
    
    /pl/archive/libdigicoll/libberet-bulkMove/RFS/DigitalImages/
    (Important)
    
    /pl/archive/libdigicoll/libberet-bulkMove/RFS/
    (23TB)
    

TODO

  1. If trial successful add new globus endpoints

  2. The CU Scholar archive is currently being manually moved. This would cut out the middle step and provide direct access from S3 to PetaLibrary.

  3. Transfer S3 bucket(cubl-ir-fcrepo) ==> /pl/archive/libdigicoll/dataSets/cu_scholar/cubl-ir-fcrepo

  4. The above actions will allow for 3 copies with one copy in a different geolocation.

  5. This is part of the Core Trust Seal actions needed for CU Scholar.

  6. AWS Lambda to move IR files on demand

Manual backup to Petalibrary

  1. Sync S3 Bucket to local drive

    cd { data download directory }
    aws s3 sync s3://cubl-ir-fcrepo .
    
  2. Install Globus Connect Personal

  3. Create Endpoint on System

  4. Use the Web Interface to start a transfer from Laptop Endpoint to Petalibrary

    • Laptop endpoint where the AWS sync happened

    • Petalibrary Endpoint /pl/archive/libdigicoll/dataSets/cu_scholar/

    • select cubl-ir-fcrepo