Sample Info

A samples info file maps samples in a reference to their population listed in a model file. This file is used by the simgenotype command.

1000 Genomes sample_info file format

You can download a samples info file compatible with the 1000G reference by executing the following.

wget https://raw.githubusercontent.com/CAST-genomics/haptools/main/example-files/1000genomes_sampleinfo.tsv

If you’d like to compute this mapping file yourself, execute the following:

cut -f 1,4 "igsr-1000 genomes on grch38.tsv" | \
sed '1d' | \
sed -e 's/ /\t/g' > 1000genomes_sampleinfo.tsv

Examples

HG00372 FIN
HG00132 GBR
HG00237 GBR
HG00404 CHS

See example-files/1000genomes_sampleinfo.tsv for an example of the 1000genomes GRCh38 samples mapped to their subpopulations.

HG00358 FIN
HG00360 FIN
HG00365 FIN
HG00372 FIN
HG00132 GBR
HG00137 GBR
HG00149 GBR
HG00151 GBR
HG00182 FIN
HG00187 FIN
HG00136 GBR
HG00233 GBR