Converting Datasets with file_tree

We often deal with structured imaging datasets, and understanding a different structure, or converting between them for data sharing or analysis can be tricky. I recently did this for a consolidated analysis and found that Michiel Cottaar’s file_tree module took much of the hassle out.

The file_tree.convert function can also handle making symlinks to save space and keep source data folders clean, and overwriting prevous runs. As an approach I found this clear, flexible and powerful.

This has now been incorporated in file_tree (0.7.2 onwards) and a minimum working example is here:

https://gitlab.com/evan.edmond/filetree-convert-demo

First, we write a text file describing the structure of each dataset and the eventual desired target.

dataset 1:

HV_{participant} (subj_dir)
  F3T_NNNN_NNN_{scancode} (scan_dir)
    images_{image_n}_boldmbep2d2mmMB6v2RS.nii.gz (func_data_unclassified)
    images_{image_n}_t1mprax1mmisowithNose32ch1001.nii.gz (T1w_unclassified)
    sbref.nii.gz (func_sbref)
    funcdata.nii.gz (func_data)
    T1w.nii.gz (T1w)
  twix_symlinks
    slsr_csi_mcycle_dw_{block_n}.dat (mrsi_data)

dataset 2:

raw_data
  C{participant} (subj_dir)
    T1.nii.gz (T1w)
    CSI
      slsr_csi_mcycle_dw_{block_n}.dat (mrsi_data)
    T1_dcm (T1w_dicom_dir)
    rest
      images_{image_n}_MB8FMRIfov21024mmresting.nii.gz (func_data_unclassified)
      sbref.nii.gz (func_sbref)
      funcdata.nii.gz (func_data)
      fmap_rest
        images_{image_n}_fieldmapgre2mmFoV216mm1001.nii.gz (fmap_mag)
        images_{image_n}_fieldmapgre2mmFoV216mm2001.nii.gz (fmap_phase)

dataset 3:

raw
  sub-{participant} (subj_dir)
    ses-placebo (ses_dir)
      anat
        sub-{participant}_ses-p_T1w.nii.gz (T1w)
      fmap
        sub-{participant}_ses-p_magnitude1.nii.gz (fmap_mag)
        sub-{participant}_ses-p_phasediff1.nii.gz (fmap_phase)
      func
        sub-{participant}_ses-p_task-{task}_bold.nii.gz (func_data)
        sub-{participant}_ses-p_task-{task}_sbref.nii.gz (func_sbref)
      mrsi
        sub-{participant}_ses-p_slsr_csi_mcycle_pre_{block_n}.dat (mrsi_data)

Then a desired target dataset structure BIDS :

rawdata
 sub-{participant} (input_subj_dir)
    anat (input_anat_dir)
      sub-{participant}_T1w.nii.gz (T1w)
      sub-{participant}_T1w_mask.nii.gz (hand_area)
      sub-{participant}_T1w_dicom (T1w_dicom_dir)
    fmap (input_fmap_dir)
      sub-{participant}_magnitude1.nii.gz (fmap_mag)
      sub-{participant}_phasediff1.nii.gz (fmap_phase)
    func (input_func_dir)
      sub-{participant}_task-{task}_bold.nii.gz (func_data)
      sub-{participant}_task-{task}_sbref.nii.gz (func_sbref)
    mrsi (input_mrsi_dir)
      sub-{participant}_vox-m1_csi_slaser_{block_n}.dat (mrsi_data)

Conversion then can be done as follows:

from file_tree import FileTree, convert

# Keys to convert (present in both trees)
keys = ["t1", "rest", "task", "fmap1", "fmap2"]

src_data = "src_data"       # Path to source data dir
target_data = "target_data" # Path to target data dir
stree = "src.tree"          # Path to source tree file
ttree = "target.tree"       # Path to target tree file


src_read = FileTree.read(stree, src_data).update_glob(keys)
target_read = FileTree.read(ttree, target_data)

convert(src_read, target_read, keys, symlink=True)
Evan Edmond
Evan Edmond
Neurology registrar, clinical researcher

I am a neurology registrar with a special interest in dementia and neurodegeneration. During my DPhil, I worked with people affected by amyotrophic lateral sclerosis. I hope to contribute towards the global challenge in dementia care. With new targeted therapies emerging, a global infrastructure for diagnosis, research, and targeted treatment is required. I am also a keen advocate for free software.