Package : processing

Module : DatasetsMerger

class src.processing.DatasetsMerger.DatasetsMerger(folder=None, combineDatasets=None)[source]

Bases: src.processing.MSGFplusMerger.MSGFplusMerger

  1. Run for UserInput:

    a datapackage or a set of datasets or a set of MSGFJobNums

  2. create a crossTab object

merge_all_jobs_in_UserInput()[source]
  1. Run for each dataset.

  2. Merge all MSGFjobs_MASIC_resultant objects.

Returns

Module : MASICmerger

class src.processing.MASICmerger.MASICmerger(folder)[source]

Bases: src.processing.MSGFplusMerger.MSGFplusMerger

Run for each dataset

merge_msgfplus_msaic(**kw)

Module : MSGFplusMerger

class src.processing.MSGFplusMerger.MSGFplusMerger(dataset_loc=None)[source]

Bases: object

Merge all MSFGjobs per dataset.

  1. Runs for each dataset.

  2. Collate “*msgfplus_syn.txt” & –> consolidate_syn object

  3. Recompute the QValue and PepQValue –>recomupted_consolidate_syn object

  4. Look for protein information into

    *msgfplus_syn_SeqToProteinMap.txt : protein Info. *msgfplus_syn_ResultToSeqMap.txt : Mapper –> MSGFjobs_Merged object

write_to_disk(df, folder, file)[source]
Parameters
  • df

  • folder

  • file

Returns

fill_holes()[source]
Returns

tackle_Unique_Seq_ID_holes_(df)[source]
Parameters

df

Returns

get_protein_info(**kw)
improve_FDR()[source]

Recompute QValue` and PepQValue 1. Use consolidate_syn_DF

keep_best_scoring_peptide(**kw)
stack_files(grouped_files, file_pattern)[source]
Parameters
  • grouped_files

  • file_pattern

Returns

group_files(folder)[source]
Parameters

folder

Returns

consolidate_syn_files()[source]
  1. For all jobs Read in(Stack): “*msgfplus_syn.txt” in _syn_DF with added JobNum & dataset column

Note: _syn_DF have duplicate rows for each Scan with MSGFDB_SpecEValue.

Returns