Sample Info
A samples info file maps samples in a reference to their population listed in a model file. This file is used by the simgenotype command.
1000 Genomes sample_info file format
You can download a samples info file compatible with the 1000G reference by executing the following.
wget https://raw.githubusercontent.com/CAST-genomics/haptools/main/example-files/1000genomes_sampleinfo.tsv
If you’d like to compute this mapping file yourself, execute the following:
cut -f 1,4 "igsr-1000 genomes on grch38.tsv" | \
sed '1d' | \
sed -e 's/ /\t/g' > 1000genomes_sampleinfo.tsv
Examples
HG00372 FIN
HG00132 GBR
HG00237 GBR
HG00404 CHS
See example-files/1000genomes_sampleinfo.tsv for an example of the 1000genomes GRCh38 samples mapped to their subpopulations.
HG00358 FIN HG00360 FIN HG00365 FIN HG00372 FIN HG00132 GBR HG00137 GBR HG00149 GBR HG00151 GBR HG00182 FIN HG00187 FIN HG00136 GBR HG00233 GBR