ld
Compute the pair-wise LD (Pearson’s correlation coefficient) between haplotypes (or genotypes) and a single TARGET haplotype (or variant).
The ld command takes as input a set of genotypes in VCF and a list of haplotypes (specified as a .hap file) and outputs a new .hap file with the computed LD values in an extra field.
By default, LD is computed with each haplotype in the .hap file. To compute LD with the variants in the genotypes file instead, you should use the –from-gts switch. When this mode is enabled, the .hap output will be replaced by a tab-delimited text file similar to PLINK 1.9’s .ld file format. It will have a header denoting the following columns:
CHR - Chromosome code for the variant
BP - Base-pair coordinate of the variant
SNP - ID of the variant
R - Pearson’s correlation coefficient between the variant and the TARGET
You may also specify genotypes in PLINK2 PGEN format instead of VCF format. See the documentation for genotypes in the format docs for more information.
Usage
haptools ld \
--region TEXT \
--sample SAMPLE \
--samples-file FILENAME \
--id ID \
--ids-file FILENAME \
--chunk-size INT \
--discard-missing \
--from-gts \
--output PATH \
--verbosity [CRITICAL|ERROR|WARNING|INFO|DEBUG|NOTSET] \
TARGET GENOTYPES HAPLOTYPES
Examples
TARGET can either be a haplotype or a variant.
For example, let’s compute LD with the haplotype ‘chr21.q.3365*1’.
haptools ld 'chr21.q.3365*1' tests/data/example.vcf.gz tests/data/basic.hap.gz | less
Or, let’s compute LD with the variant ‘rs429358’.
haptools ld -o apoe4_ld.hap rs429358 tests/data/apoe.vcf.gz tests/data/apoe4.hap
Alternatively, we can compute LD between the APOe4 haplotype and all genotypes in the VCF by using the --from-gts switch. Note that we should use a different extension for the output file now.
haptools ld --from-gts -o apoe4.ld APOe4 tests/data/apoe.vcf.gz tests/data/apoe4.hap
You can select a subset of variants (or haplotypes) using the --id parameter multiple times (or the --ids-file parameter).
haptools ld --from-gts -i rs543363163 -i rs7412 APOe4 tests/data/apoe.vcf.gz tests/data/apoe4.hap
All files used in these examples are described here.
Detailed Usage
haptools
haptools: A toolkit for simulating and analyzing genotypes and phenotypes while taking into account haplotype information
haptools [OPTIONS] COMMAND [ARGS]...
Options
- --version
Show the version and exit.
ld
Compute the pair-wise LD (Pearson’s correlation) between haplotypes (or variants) and a single TARGET haplotype (or variant)
GENOTYPES must be formatted as a VCF or PGEN and HAPLOTYPES must be formatted according to the .hap format spec
TARGET refers to the ID of a variant or haplotype. LD is computed pair-wise between TARGET and all of the other haplotypes in the .hap (or genotype) file
If TARGET is a variant ID, the ID must appear in GENOTYPES. Otherwise, it must be present in the .hap file
haptools ld [OPTIONS] TARGET GENOTYPES HAPLOTYPES
Options
- --region <region>
The region from which to extract haplotypes; ex: ‘chr1:1234-34566’ or ‘chr7’. For this to work, the VCF and .hap file must be indexed and the seqname provided must correspond with one in the files
- Default
all haplotypes
- -s, --sample <samples>
A list of the samples to subset from the genotypes file (ex: ‘-s sample1 -s sample2’)
- Default
all samples
- -S, --samples-file <samples_file>
A single column txt file containing a list of the samples (one per line) to subset from the genotypes file
- Default
all samples
- -i, --id <ids>
A list of the haplotype IDs to use from the .hap file (ex: ‘-i H1 -i H2’). Or, if –from-gts, a list of the variant IDs to use from the genotypes file. For this to work, the .hap file must be indexed
- Default
all haplotypes
- -I, --ids-file <ids_file>
A single column txt file containing a list of the haplotype (or variant) IDs (one per line) to subset from the .hap (or genotype) file
- Default
all haplotypes
- -c, --chunk-size <chunk_size>
If using a PGEN file, read genotypes in chunks of X variants; reduces memory
- Default
all variants
- --discard-missing
Ignore any samples that are missing genotypes for the required variants
- Default
False
- --from-gts
By default, LD is computed with the haplotypes in the .hap file. Use this switch to compute LD with the genotypes in the genotypes file, instead.
- Default
False
- -o, --output <output>
A .hap file containing haplotypes and their LD with TARGET
- Default
stdout
- -v, --verbosity <verbosity>
The level of verbosity desired
- Default
INFO- Options
CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET
Arguments
- TARGET
Required argument
- GENOTYPES
Required argument
- HAPLOTYPES
Required argument