Making alignments with cactus
To run Cactus you'll need your genomes. And an input file where the first line is a Newick style tree of the names of what you want to align, followed by lines that have a tab delimited name / path. The name must be unique and it will be what that genome is called in the alignment. The path is where that fasta is located. These are supposed to be softmasked genomes, so if you can do that before.
Here's an example of aligning the Xbir_pacbio_v2023.1 genome with the X. maculatus genome
(xbir-COAC-16-VIII-22-M_v2023.1,GCA_002775205.2_X_maculatus-5.0-male_genomic); xbir-COAC-16-VIII-22-M_v2023.1 xbir-COAC-16-VIII-22-M_v2023.1.fa GCA_002775205.2_X_maculatus-5.0-male_genomic GCA_002775205.2_X_maculatus-5.0-male_genomic.fna
Cactus and haltools need to be run in a python virtual environment so these commands must be run first.
ml python/3.9.0 virtualenv -p python3.9 /home/groups/schumer/shared_bin/cactus/cactus-bin-v2.2.3/cactus_env echo "export PATH=/home/groups/schumer/shared_bin/cactus/cactus-bin-v2.2.3/bin:\$PATH" >> /home/groups/schumer/shared_bin/cactus/cactus-bin- v2.2.3/cactus_env/bin/activate echo "export PYTHONPATH=/home/groups/schumer/shared_bin/cactus/cactus-bin-v2.2.3/lib:\$PYTHONPATH" >> /home/groups/schumer/shared_bin/cactus/cactus-bin-v2.2.3/cactus_env/bin/activate source /home/groups/schumer/shared_bin/cactus/cactus-bin-v2.2.3/cactus_env/bin/activate
Even aligning two genomes takes more than two days, so here's an example script allowing seven days for it to complete.
#!/bin/bash #SBATCH --job-name=cactus #SBATCH --time=168:00:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=64000 #SBATCH -p schumer,hns echo "xbir_xmac" ml python/3.9.0 virtualenv -p python3.9 /home/users/qlangdon/cactus-bin-v2.2.3/cactus_env echo "export PATH=/home/users/qlangdon/cactus-bin-v2.2.3/bin:\$PATH" >> /home/users/qlangdon/cactus-bin-v2.2.3/cactus_env/bin/activate echo "export PYTHONPATH=/home/users/qlangdon/cactus-bin-v2.2.3/lib:\$PYTHONPATH" >> /home/users/qlangdon/cactus-bin-v2.2.3/cactus_env/bin/activate source /home/users/qlangdon/cactus-bin-v2.2.3/cactus_env/bin/activate cactus ./jobstore ./xbirXmac_cactusInput.txt ./xbir-COAC-16-VIII-22-M_v2023.1_GCA_002775205.2_X_maculatus-5.0-male_genomic.hal --realTimeLogging halValidate xbir-COAC-16-VIII-22-M_v2023.1_GCA_002775205.2_X_maculatus-5.0-male_genomic.hal halStats xbir-COAC-16-VIII-22-M_v2023.1_GCA_002775205.2_X_maculatus-5.0-male_genomic.hal > xbir-COAC-16-VIII-22-M_v2023.1_GCA_002775205.2_X_maculatus-5.0- male_genomic.hal_stats halSummarizeMutations xbir-COAC-16-VIII-22-M_v2023.1_GCA_002775205.2_X_maculatus-5.0-male_genomic.hal > xbir-COAC-16-VIII-22-M_v2023.1_GCA_002775205.2_X_maculatus-5.0-male_genomic.hal_sumMut
The final step are just useful to check that the alignment completed and gets you some summary info. The intermediat files are put into the jobstore folder and if the job completes it should disappear. If the job crashes for ambiguous reasons try removing the jobstore folder and starting the run again (worked for me -Quinn- a surprising number of times.) You may struggle to get a lot of genomes aligned this way, so look into running them step by step or progressively.
https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/updating-alignments.md
From here you can do liftovers or other things