A collection of made-up sequences in the style of long reads over a repeat region
(e.g. NOTCH2NLC), with meta-data describing the participant each read is from and
the family each participant is from. Can be used in visualise_many_sequences()
,
visualise_methylation()
, and helper functions to visualise these sequences.
Generation code is available at data-raw/example_many_sequences.R
Format
example_many_sequences
A dataframe with 23 rows and 10 columns:
- family
Participant family
- individual
Participant ID
- read
Unique read ID
- sequence
DNA sequence of the read
- sequence_length
Length (nucleotides) of the read
- quality
FASTQ quality scores for the read. Each character represents a score from 0 to 40 - see
fastq_quality_scores
.
These values are made up viapmin(pmax(round(rnorm(n, mean = 20, sd = 10)), 0), 40)
i.e. sampled from a normal distribution with mean 20 and standard deviation 10, then rounded to integers between 0 and 40 (inclusive) - seeexample_many_sequences.R
- methylation_locations
Indices along the read (starting at 1) at which methylation probability was assessed i.e. CpG sites. Stored as a single character value per read, condensed from a numeric vector via
vector_to_string()
.- methylation_probabilities
Probability of methylation (8-bit integer i.e. 0-255) for each assessed base. Stored as a single character value per read, condensed from a numeric vector via
vector_to_string()
.
These values are made up viaround(runif(n, min = 0, max = 255))
- seeexample_many_sequences.R
- hydroxymethylation_locations
Indices along the read (starting at 1) at which hydroxymethylation probability was assessed i.e. CpG sites. Stored as a single character value per read, condensed from a numeric vector via
vector_to_string()
.- hydroxymethylation_probabilities
Probability of hydroxymethylation (8-bit integer i.e. 0-255) for each assessed base. Stored as a single character value per read, condensed from a numeric vector via
vector_to_string()
.
These values are made up viaround(runif(n, min = 0, max = 255 - this_base_methylation_probability))
such that the summed methylation and hydroxymethylation probability never exceeds 255 (100%) - seeexample_many_sequences.R