This document is the beginning of a training document to describe the process from data curation from the novaseq, bcl2fastq conversion, indexing for metadata, archival to tape, validation, labeling, retrieval.
Data from a novaseq run is controlled via the novaseq machine – novaseqdata has a samba share which the machine can see from the front end program. This is selected and created at the start of a run giving the run it’s name and directory. Data written here is as sbsuser and that also translates into sbsuser on novaseqdata
After the run is complete the data is run through a run through a program to extract fastq data
bcl2fastq
ssh into novaseqdata as sbsuser
run the bcl2fastq command – run from within the /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX
Note: ignore \ in the following listings – type it all in on one line.
cd /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX
nohup /usr/local/bin/bcl2fastq -R \ /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX -o \ /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX --sample-sheet \ /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX/SampleSheet.csv
The process can take a couple of hours from start to finish depending on the run type – S2 or S4
CellRanger
Note: ignore \ in the following listings – type it all in on one line.
cd /mnt/novaseqdata/training/190823_A00471_0062_BHF7JJDRXX/
/opt/cellranger-3.0.2/cellranger mkfastq --run=/mnt/novaseqdata/training/ \ 190823_A00471_0062_BHF7JJDRXX/ \ --samplesheet=/mnt/novaseqdata/training/ \ 190823_A00471_0062_BHF7JJDRXX/sample_sheet_dan_williamson.csv
The process can take an hour or so to run – creates HF7JJDRXX as a subdirectory which has the fastq files written within it
index the directory data with tree
From within the /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX run directory issue this command
tree -a -T '180525_A00471_0028_AHCKH2DMXX' -H \ /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX -o index.html \ /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX
tape backup and restore * see later entry for tape manipulation in the library
From within the /mnt/novaseqdata/training directory issue this command
cd /mnt/novaseqdata/training/ tar -cMvf /dev/st0 180525_A00471_0028_AHCKH2DMXX/*
To restore the data from tape: –
cd /tmp tar -xMvf /dev/st0 180525_A00471_0028_AHCKH2DMXX/* or tar -xMvf /dev/st0
rsync from novaseqdata to rocket
rsync -av \ /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX nbh23@rocket.hpc.ncl.ac.uk:/nobackup/proj/scbsu/
Trailing slashes are important – if there is no upstream directory 180525_A00471_0028_AHCKHSDMXX rsync will create it and write data into it accordingly. (NB can’t have more than one / separation upstream I discovered)
Once done it needs to be chmod’d recursively to 775 for other people from the group to access it: –
chmod -R 775 /nobackup/proj/scbsu/180525_A00471_0028_AHCKH2DMXX/