ld

Compute the pair-wise LD (Pearson’s correlation coefficient) between haplotypes (or genotypes) and a single TARGET haplotype (or variant).

The ld command takes as input a set of genotypes in VCF and a list of haplotypes (specified as a .hap file) and outputs a new .hap file with the computed LD values in an extra field.

By default, LD is computed with each haplotype in the .hap file. To compute LD with the variants in the genotypes file instead, you should use the –from-gts switch. When this mode is enabled, the .hap output will be replaced by an .ld file.

Note

Repeats are not currently supported by the ld command. Any repeats in your .hap file will be ignored.

You may also specify genotypes in PLINK2 PGEN format instead of VCF format. See the documentation for genotypes in the format docs for more information.

Usage

haptools ld \
--region TEXT \
--sample SAMPLE \
--samples-file FILENAME \
--id ID \
--ids-file FILENAME \
--chunk-size INT \
--discard-missing \
--from-gts \
--output PATH \
--verbosity [CRITICAL|ERROR|WARNING|INFO|DEBUG|NOTSET] \
TARGET GENOTYPES HAPLOTYPES

Examples

TARGET can either be a haplotype or a variant.

For example, let’s compute LD with the haplotype ‘chr21.q.3365*1’.

haptools ld 'chr21.q.3365*1' tests/data/example.vcf.gz tests/data/basic.hap.gz | less

Or, let’s compute LD with the variant ‘rs429358’.

haptools ld -o apoe4_ld.hap rs429358 tests/data/apoe.vcf.gz tests/data/apoe4.hap

Alternatively, we can compute LD between the APOe4 haplotype and all genotypes in the VCF by using the --from-gts switch. Note that we should use a different extension for the output file now.

haptools ld --from-gts -o apoe4.ld APOe4 tests/data/apoe.vcf.gz tests/data/apoe4.hap

You can select a subset of variants (or haplotypes) using the --id parameter multiple times (or the --ids-file parameter).

haptools ld --from-gts -i rs543363163 -i rs7412 APOe4 tests/data/apoe.vcf.gz tests/data/apoe4.hap

All files used in these examples are described here.

Detailed Usage

haptools

haptools: A toolkit for simulating and analyzing genotypes and phenotypes while taking into account haplotype information

haptools [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

ld

Compute the pair-wise LD (Pearson’s correlation) between haplotypes (or variants) and a single TARGET haplotype (or variant)

GENOTYPES must be formatted as a VCF or PGEN and HAPLOTYPES must be formatted according to the .hap format spec

TARGET refers to the ID of a variant or haplotype. LD is computed pair-wise between TARGET and all of the other haplotypes in the .hap (or genotype) file

If TARGET is a variant ID, the ID must appear in GENOTYPES. Otherwise, it must be present in the .hap file

haptools ld [OPTIONS] TARGET GENOTYPES HAPLOTYPES

Options

--region <region>

The region from which to extract haplotypes; ex: ‘chr1:1234-34566’ or ‘chr7’. For this to work, the VCF and .hap file must be indexed and the seqname provided must correspond with one in the files

Default:

all haplotypes

-s, --sample <samples>

A list of the samples to subset from the genotypes file (ex: ‘-s sample1 -s sample2’)

Default:

all samples

-S, --samples-file <samples_file>

A single column txt file containing a list of the samples (one per line) to subset from the genotypes file

Default:

all samples

-i, --id <ids>

A list of the haplotype IDs to use from the .hap file (ex: ‘-i H1 -i H2’). Or, if –from-gts, a list of the variant IDs to use from the genotypes file. For this to work, the .hap file must be indexed

Default:

all haplotypes

-I, --ids-file <ids_file>

A single column txt file containing a list of the haplotype (or variant) IDs (one per line) to subset from the .hap (or genotype) file

Default:

all haplotypes

-c, --chunk-size <chunk_size>

If using a PGEN file, read genotypes in chunks of X variants; reduces memory

Default:

all variants

--discard-missing

Ignore any samples that are missing genotypes for the required variants

Default:

False

--from-gts

By default, LD is computed with the haplotypes in the .hap file. Use this switch to compute LD with the genotypes in the genotypes file, instead.

Default:

False

-o, --output <output>

A .hap file containing haplotypes and their LD with TARGET

Default:

stdout

-v, --verbosity <verbosity>

The level of verbosity desired

Default:

INFO

Options:

CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET

Arguments

TARGET

Required argument

GENOTYPES

Required argument

HAPLOTYPES

Required argument