Aug 282019
 

This document is the beginning of a training document to describe the process from data curation from the novaseq, bcl2fastq conversion, indexing for metadata, archival to tape, validation, labeling, retrieval.

Data from a novaseq run is controlled via the novaseq machine – novaseqdata has a samba share which the machine can see from the front end program. This is selected and created at the start of a run giving the run it’s name and directory. Data written here is as sbsuser and that also translates into sbsuser on novaseqdata

After the run is complete the data is run through a run through a program to extract fastq data

bcl2fastq

ssh into novaseqdata as sbsuser

run the bcl2fastq command  – run from within the /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX

Note: ignore \ in the following listings – type it all in on one line.

cd /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX
nohup /usr/local/bin/bcl2fastq -R \
/mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX -o \
/mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX --sample-sheet \
/mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX/SampleSheet.csv

The process can take a couple of hours from start to finish depending on the run type – S2 or S4

CellRanger

Note: ignore \ in the following listings – type it all in on one line.

cd /mnt/novaseqdata/training/190823_A00471_0062_BHF7JJDRXX/
/opt/cellranger-3.0.2/cellranger mkfastq --run=/mnt/novaseqdata/training/ \
190823_A00471_0062_BHF7JJDRXX/ \
--samplesheet=/mnt/novaseqdata/training/ \
190823_A00471_0062_BHF7JJDRXX/sample_sheet_dan_williamson.csv

The process can take an hour or so to run – creates HF7JJDRXX as a subdirectory  which has the fastq files written within it

index the directory data with tree

From within the /mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX run directory issue this command

tree -a -T '180525_A00471_0028_AHCKH2DMXX' -H \
/mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX -o index.html \
/mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX

tape backup and restore * see later entry for tape manipulation in the library

From within the /mnt/novaseqdata/training directory issue this command

cd /mnt/novaseqdata/training/
tar -cMvf /dev/st0 180525_A00471_0028_AHCKH2DMXX/*

To restore the data from tape: –

cd /tmp
tar -xMvf /dev/st0 180525_A00471_0028_AHCKH2DMXX/*
or
tar -xMvf /dev/st0

rsync from novaseqdata to rocket

rsync -av \
/mnt/novaseqdata/training/180525_A00471_0028_AHCKH2DMXX 
nbh23@rocket.hpc.ncl.ac.uk:/nobackup/proj/scbsu/

Trailing slashes are important – if there is no upstream directory 180525_A00471_0028_AHCKHSDMXX rsync will create it and write data  into it accordingly. (NB can’t have more than one / separation upstream I discovered)

Once done it needs to be chmod’d recursively to 775 for other people from the group to access it: –

chmod -R 775 /nobackup/proj/scbsu/180525_A00471_0028_AHCKH2DMXX/
 Posted by at 8:37 am