mpa.SimulateSort

Overview

SimulateSort is a program within the mpathic package which simulates performing a Sort Seq experiment.

Usage

>>> import mpathic
>>> loader = mpathic.io
>>> mp_df = loader.load_model('./mpathic/examples/true_model.txt')
>>> dataset_df = loader.load_dataset('./mpathic/data/sortseq/full-0/library.txt')
>>> mpathic.SimulateSort(df=dataset_df,mp=mp_df)

Example Input and Output

The input table to this function must contain sequence, counts, and energy columns

Example Input Table:

seq    ct    val
AGGTA  5     -.4
AGTTA  1     -.2
...

Example Output Table:

seq    ct    val    ct_1     ct_2     ct_3 ...
AGGTA  5     -.4    1        2        1
AGTTA  1     -.2    0        1        0
...

The output table will contain all the original columns, along with the sorted columns (ct_1, ct_2 …)

Class Details

class simulate_sort.SimulateSort(**kwargs)

Simulate cell sorting based on expression.

Parameters:
df: (pandas dataframe)

Input data frame.

mp: (pandas dataframe)

Model data frame.

noisetype: (string, None)

Noise parameter string indicating what type of

noise to include. Valid choices include None, ‘Normal’, ‘LogNormal’, ‘Plasmid’

npar: (list)

parameters to go with noisetype. E.g. for

noisetype ‘Normal’, npar must contain the width of the normal distribution

nbins: (int)

Number of bins that the different variants will get sorted into.

sequence_library: (bool)

A value of True corresponds to simulating sequencing the library in bin zero

start: (int)

Position to start analyzed region

end: (int)

Position to end analyzed region

chunksize: (int)

This represents the size of chunk the data frame df will be traversed over.

Attributes:
output_df: (pandas data frame)

contains the output of the simulate_sort constructor