ETD Loader Process¶
It is stored inside IR project on GitHub
Folder Structure¶
All the scripts running are stored inside IR container. However, all of the ETD files are store at: /efs/prod/proquest/
or /efs/test/proquest
.
Access scholar-worker image with Kubectl. Then go to /efs/prod/proquest
or /efs/test/proquest
to see those files.
.zip
: new zip files, have not processed..zip.proccessed
: files have been unzipped and processedlogs/
: folder to store log filesproccessing_folder/
: folder to store files after unzip (.zip files) to process.rejected/
: folder to store rejected files(.zip). If there is an error happen during the process. The script will move .zip error file to this folder.unaccepted/
: folder to store unaccepted files(.zip). If the ETD item is not allow to load to IR. It will be moved to this folder.
Execute Script¶
scholar-worker
kubectl exec -it scholar-worker-78f7c8646-mztqv -n scholar -- bash
At /app run command below:
In TEST:
python3 etd-loader/main.py /efs/test/proquest/ /efs/test/proquest/processing_folder/ number_item
In PRODUCTION:
python3 etd-loader/main.py /efs/prod/proquest/ /efs/prod/proquest/processing_folder/ number_item
number_item
: is number of zip file that you want to process. You can put any number as long as it is easy to keep track. Less than 20 is the suggestion.etd-loader/main.py
: access to script./efs/test/proquest/
: where the ETD .zip files store./efs/prod/proquest/processing_folder/
: where ETD .zip files extract.
Go to scholar website to make sure its loaded including any upload files. Check logs and folders to see if is there any rejected files, unaccepted files or any error might happen.