This file describes how we made the browser database on the mouse
genome,November 2001 build.


PROCESSING MOUSE MRNA FROM GENBANK INTO DATABASE. (done)

o - Create the mouseRNA and mouseEST fasta files.
    ssh kkstore
    cd /cluster/store1/genbank.127
    gunzip -c gbrod*.gz | gbToFaRa /projects/compbio/experiments/hg/h/mouseRna.fil ../mrna.127/mouseRna.fa ../mrna.127/mouseRna.ra ../mrna.127/mouseRna.ta stdin
    gunzip -c gbest*.gz | gbToFaRa /projects/compbio/experiments/hg/h/mouseRna.fil ../mrna.127/mouseEst.fa ../mrna.127/mouseEst.ra ../mrna.127/mouseEst.ta stdin


CREATING DATABASE AND STORING mRNA/EST SEQUENCE AND AUXILIARY INFO (done)

o - Create the database.
     - ssh hgwdev
     - Enter mysql via:
           mysql -u hgcat -pbigsecret
     - At mysql prompt type:
	create database mm1;
	quit
     - make a semi-permanent read-only alias:
        alias mm1 "mysql -u hguser -phguserstuff -A mm1"
o - Use df to ake sure there is at least 5 gig free on hgwdev:/usr/local/mysql 
o - Store the mRNA (non-alignment) info in database.
     hgLoadRna new mm1
     hgLoadRna add mm1 /cluster/store1/mrna.127/mouseRna.fa /cluster/store1/mrna.127/mouseRna.ra
     hgLoadRna add mm1 /cluster/store1/mrna.127/mouseEst.fa /cluster/store1/mrna.127/mouseEst.ra
    The last line will take quite some time to complete.  It will count up to
    about 3,800,000 before it is done.
    NOTE: I had to patch the extFile table a little after this.  Time to add a type option
    to hgLoadRna perhaps. -jk


BREAK UP THE MOUSE SEQUENCE INTO 1 MB CHUNKS AT NON_BRIDGED CONTIGS (done)

o - This version of the mouse sequence data is in /cluster/store2/mm.2001.11/mm1/assembly
  - The files are mouse.oo.03.agp and mouse.oo.03.agp.fasta
o - cd into your CVS source tree under kent/src/hg/splitFaIntoContigs
    - Type make
    - Run splitFaIntoContigs /cluster/store2/mm.2001.11/mm1/assembly/mouse.oo.03.agp /cluster/store2/mm.2001.11/mm1/assembly/mouse.oo.03.agp.fasta /cluster/store2/mm.2001.11/mm1 1000000
    - This will split the mouse sequence into approx. 1Mbase supercontigs between non-bridged clone contigs and drop the resulting dir structure in /cluster/store2/mm.2001.11/mm1.
    - The resulting dir structure will include 1 dir for each chromosome, each of which has a set of subdirectories, one subdir per supercontig.    


COPY THE MOUSE SEQUENCE DATA TO THE CLUSTER (done)
o - ssh kkstore
o - Copy the mouseRNA and mouseEst data to the cluster
    - cp /cluster/store1/mrna.127/mouseRna.fa /scratch/hg/mrna.127
    - cp /cluster/store1/mrna.127/mouseEst.fa /scratch/hg/mrna.127
o - Copy the mouse sequence supercontigs to the cluster
    - mkdir /scratch/hg/mm1/
    - mkdir /scratch/hg/mm1/contigs
    - cp /cluster/store2/mm.2001.11/mm1/*/chr*/chr*.fa /scratch/hg/mm1/contigs

REPEAT MASKING (done)

Repeat mask on sensitive settings by doing
    ssh kk
    cd ~/mm/bed
    mkdir rmskS
    cd rmskS
    cp ~/lastMm/bed/rmskS/rmskS.csh .
edit rmskS.csh to update paths to current mouse assembly
    cp ~/lastMm/bed/rmskS/template
    ls -1 /scratch/hg/mm1/contigs/*.fa > genome.lst
    gensub2 genome.lst single template spec
    mkdir out
    para create spec
then do
    para try
and check things out.  If it looks good then para push or para shove
until finished.

Next copy these into the chromosome/contig directories by updating
the paths in jkStuff/cpRmskS.sh, and then sourcing it.

Then lift these up to chromosome coordinates with
    cd ~/mm
    tcsh jkStuff/liftOut2.sh
Then load the .out files into the database with:
    ssh hgwdev
    cd ~/mm
    hgLoadOut mm1 ?/*.fa.out ??/*.fa.out

STORING O+O SEQUENCE AND ASSEMBLY INFORMATION  (done)

Create packed chromosome sequence files 
     ssh kkstore
     cd ~/oo
     tcsh jkStuff/makeNib.sh

Load chromosome sequence info into database 
     ssh hgwdev
     mysql -A -u hgcat -pbigsecret mm1  < ~/src/hg/lib/chromInfo.sql
     cd ~/mm
     hgNibSeq -preMadeNib mm1 /cluster/store2/mm.2001.11/mm1/nib ?/chr*.fa ??/chr*.fa NA*/chr*.fa

Store o+o info in database.
     cd /cluster/store2/mm.2001.11/mm1
     hgGoldGapGl mm1 /cluster/store2/mm.2001.11 mm1 -noGl

Make and load GC percent table
     ssh hgwdev
     cd /cluster/store2/mm.2001.11/mm1/bed
     mkdir gcPercent
     cd gcPercent
     mysql -A -u hgcat -pbigsecret mm1  < ~/src/hg/lib/gcPercent.sql
     hgGcPercent mm1 ../../nib



MAKING AND STORING mRNA AND EST ALIGNMENTS  (done)

o - Load up the local disks of the cluster with refSeq.fa, mrna.fa and est.fa
    from /cluster/store1/mrna.127  into /var/tmp/hg/h/mrna

o - Use BLAT to generate refSeq, mRNA and EST alignments as so:
      Make sure that /scratch/hg/gs.11/build28/contigs is loaded
      with NT_*.fa and pushed to the cluster nodes.  The following
      cshell script needs updating.

	  cd ~/mm/bed
	  foreach i (refSeq mrna est)
	      mkdir $i
	      cd $i
	      echo /scratch/hg/gs.11/build28/contigs | wordLine stdin > genome.lst
	      ls -1 /scratch/hg/mrna.127/$i.fa > mrna.lst
	      mkdir psl
	      gensub2 genome.lst mrna.lst gsub spec
	      jabba make hut spec
	      jabba push hut
	  end 

    check on progress with jabba check hut in mrna, est, and refSeq
    directories.

      
o - Process refSeq mRNA and EST alignments into near best in genome.
      cd ~/mm/bed

      cd refSeq
      pslSort dirs raw.psl /cluster/fast1/temp psl
      pslReps -minCover=0.2 -sizeMatters -minAli=0.98 -nearTop=0.002 raw.psl contig.psl /dev/null
      liftUp -nohead all_refSeq.psl ../../jkStuff/liftAll.lft carry contig.psl
      pslSortAcc nohead chrom /cluster/fast1/temp all_refSeq.psl
      cd ..

      cd mrna
      pslSort dirs raw.psl /cluster/fast1/temp psl
      pslReps -minAli=0.98 -sizeMatters -nearTop=0.005 raw.psl contig.psl /dev/null
      liftUp -nohead all_mrna.psl ../../jkStuff/liftAll.lft carry contig.psl
      pslSortAcc nohead chrom /cluster/fast1/temp all_mrna.psl
      cd ..

      cd est
      pslSort dirs raw.psl /cluster/fast1/temp psl
      pslReps -minAli=0.98 -sizeMatters -nearTop=0.005 raw.psl contig.psl /dev/null
      liftUp -nohead all_est.psl ../../jkStuff/liftAll.lft carry contig.psl
      pslSortAcc nohead chrom /cluster/fast1/temp all_est.psl
      cd ..

o - Load mRNA alignments into database.
      ssh hgwdev
      cd /cluster/store2/mm.2001.11/mm1/bed/mrna/chrom
      foreach i (*.psl)
          mv $i $i:r_mrna.psl
      end
      hgLoadPsl mm1 *.psl
      cd ..
      hgLoadPsl mm1 all_mrna.psl -nobin

o - Load EST alignments into database.
      ssh hgwdev
      cd /cluster/store2/mm.2001.11/mm1/bed/est/chrom
      foreach i (*.psl)
            mv $i $i:r_est.psl
      end
      hgLoadPsl mm1 *.psl
      cd ..
      hgLoadPsl mm1 all_est.psl -nobin

o - Create subset of ESTs with introns and load into database.
      - ssh kkstore
      cd ~/mm
      tcsh jkStuff/makeIntronEst.sh
      - ssh hgwdev
      cd ~/mm/bed/est/intronEst
      hgLoadPsl mm1 *.psl

PRODUCING KNOWN GENES (done)

o - Download everything from ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/
    into /cluster/store1/mrna.127/refSeq.
o - Unpack this into fa files and get extra info with:
       cd /cluster/store1/mrna.127/refSeq
       gunzip mouse.gbff.gz
       gunzip mouse.faa.gz
o - Get extra info from NCBI and produce refGene table as so:
       ssh hgwdev
       cd ~/mm/bed/refSeq
       wget ftp://ncbi.nlm.nih.gov/refseq/LocusLink/loc2ref 
       wget ftp://ncbi.nlm.nih.gov/refseq/LocusLink/mim2loc
o - Similarly download refSeq proteins in fasta format to refSeq.pep
o - Align these by processes described under mRNA/EST alignments above.
o - Produce refGene, refPep, refMrna, and refLink tables as so:
       ssh hgwdev
       cd ~/mm/bed/refSeq
       ln -s /cluster/store1/mrna.127 mrna
       hgRefSeqMrna mm1 mrna/mouseRefSeq.fa mrna/mouseRefSeq.ra all_refSeq.psl loc2ref mrna/refSeq/mouse.faa mim2loc
o - Add RefSeq status info (done 6/19/02)
    hgRefSeqStatus mm1 loc2ref


SIMPLE REPEAT TRACK (done)

o - Create cluster parasol job like so:
        ssh kk
	cd ~/mm/bed
	mkdir simpleRepeat
	cd simpleRepeat
	cp ~/lastOo/bed/simpleRepeat/gsub
	mkdir trf
	echo single > single.lst
        echo /scratch/hg/gs.11/build28/contigs | wordLine stdin > genome.lst
	gensub2 genome.lst single.lst gsub spec
	para make spec
        para push 
     This job had some problems this time combined with the
     cluster going bizarre.  We ended up running it serially
     which only takes about 8 hours anyway.
        liftUp simpleRepeat.bed ~/mm/jkStuff/liftAll.lft warn trf/*.bed

o - Load this into the database as so
        ssh hgwdev
	cd ~/mm/bed/simpleRepeat
	hgLoadBed mm1 simpleRepeat simpleRepeat.bed -sqlTable=$HOME/src/hg/lib/simpleRepeat.sql


PRODUCING GENSCAN PREDICTIONS (done)
    
o - Produce contig genscan.gtf genscan.pep and genscanExtra.bed files like so:

	First make sure you have appropriate set up, permissions, etc. and you have 
     	tried using Parasol to submit and finished a set of jobs successfully.
     	
	Log into kk
	     	cd ~/mm
     		cd bed/genscan
	Make 3 subdirectories for genscan to put their output files in
		mkdir gtf pep subopt
	Generate a list file, genome.list, of all the contigs
		ls /scratch/hg/mm1/contigs/*.fa >genome.list	
	Create a dummy file, single, containing just 1 single line of any text.	
	Create template file, gsub, for gensub2.  For example (3 lines file):
		#LOOP
		/cluster/home/fanhsu/bin/i386/gsBig {check in line+ $(path1)} {check out line gtf/$(root1).gtf} -trans={check out line pep/$(root1).pep} -subopt={check out line subopt/$(root1).bed} -exe=/cluster/home/fanhsu/projects/compbio/bin/genscan-linux/genscan -par=/cluster/home/fanhsu/projects/compbio/bin/genscan-linux/HumanIso.smat -tmp=/tmp -window=2400000
		#ENDLOOP
	Generate job list file, jobList, for Parasol
		gensub2 genome.list single gsub jobList
	First issue the following Parasol command:
		para create jobList
	Run the following command, which will try first 10 jobs from jobList
		para try
	Check if these 10 jobs run OK by
		para check
	If they have problems, debug and fix your program, template file, 
	commands, etc. and try again.  If they are OK, then issue the following 
	command, which will ask Parasol to start all the jobs (around 3000 jobs).
		para push
	Issue either one of the following two commands to check the 
	status of the cluster and your jobs, until they are done.
		parasol status
		para check
	If any job fails to complete, study the problem and ask Jim to help
	if necessary.

o - Convert these to chromosome level files as so:
     cd ~/mm
     cd bed/genscan
     liftUp genscan.gtf ../../jkStuff/liftAll.lft warn gtf/*.gtf
     liftUp genscanSubopt.bed ../../jkStuff/liftAll.lft warn subopt/*.bed
     cat pep/*.pep > genscan.pep

o - Load into the database as so:
     ssh hgwdev
     cd ~/mm/bed/genscan
     ldHgGene mm1 genscan genscan.gtf
     hgPepPred mm1 generic genscanPep genscan.pep
     hgLoadBed mm1 genscanSubopt genscanSubopt.bed


PREPARING SEQUENCE FOR CROSS SPECIES ALIGNMENTS (done)

Make sure that the NT*.fa files are lower-case repeat masked, and that
the simpleRepeat track is made.  Then put doubly (simple & interspersed)
repeat mask files onto /cluster local drive as so.
    ssh kkstore
    mkdir /scratch/hg/mm1/trfFa
    cd ~/mm
    foreach i (? ?? NA_*)
	cd $i
        foreach j (chr${i}_*)
	    if (-d $j) then
		maskOutFa $j/$j.fa ../bed/simpleRepeat/trf/$j.bed -softAdd /scratch/hg/mm1/trfFa/$j.fa.trf
		echo done $i/$j
	    endif
	end
	cd ..
    end
Then ask admins to do a binrsync.


DOING HUMAN/MOUSE ALIGMENTS (todo)

o - Download the lower-case-masked assembly and put it in
    kkstore:/cluster/store1/a2ms.
   
o - Mask simple repeats in addition to normal repeats with:
        mkdir ~/mm/jkStuff/trfCon
	cd ~/mm/allctgs
	/bin/ls -1 | grep -v '\.' > ~/mm/jkStuff/trfCon/ctg.lst
        cd ~/mm/jkStuff/trfCon
	mkdir err log out
    edit ~/mm/jkStuff/trf.gsub to update gs and mm version
	gensub ctg.lst ~/mm/jkStuff/trf.gsub
	mv gensub.out trf.con
	condor_submit trf.con
    wait for this to finish.  Check as so
        cd ~/mm
	source jkStuff/checkTrf.sh
    there should be no output.
o - Copy the RepeatMasked and trf masked genome to
    kkstore:/scratch/hg/gs.11/build28/contigTrf, and ask
    Jorge and all to binrsync to propagate the data
    across the new cluster.
o - Download the assembled mouse genome in lower-case
    masked form to /cluster/store1/arachne.3/whole.  
    Execute the script splitAndCopy.csh to chop it
    into roughly 50M pieces in arachne.3/parts
o - Set up the jabba job to do the alignment as so:
       ssh kkstore
       cd /cluster/store2/mm.2001.11/mm1
       mkdir blatMouse.phusion
       cd blatMouse.phusion
       ls -1S /scratch/hg/gs.3/build28/contigTrf/* > human.lst
       ls -1 /cluster/store1/arachne.3/parts/* > mouse.lst
    Make a file 'gsub' with the following three lines in it
#LOOP
/cluster/home/kent/bin/i386/blat -q=dnax -t=dnax {check in line+ $(path2)} {check in line+ $(path1)}  {check out line+ psl/$(root2)_$(root1).psl} -minScore=20 -minIdentity=20 -tileSize=4 -minMatch=2 -oneOff=0 -ooc={check in exists /scratch/hg/h/4.pooc} -qMask=lower -mask=lower
#ENDLOOP
    Process this into a jabba file and launch the first set
    of jobs (10,000 out of 70,000) as so:
        gensub2 mouse.lst human.lst gsub spec
	jabba make hut spec
	jabba push hut
    Do a 'jabba check hut' after about 20 minutes and make sure
    everything is right.  After that make a little script that
    does a "jabba push hut" followed by a "sleep 30" about 50
    times.  Interrupt script when you see jabba push say it's
    not pushing anything.

o - Sort alignments as so 
       ssh kkstore
       cd /cluster/store2/mm.2001.11/mm1/blatMouse
       pslCat -dir -check psl | liftUp -type=.psl stdout ../liftAll.lft warn stdin | pslSortAcc nohead chrom /cluster/store2/temp stdin
o - Get rid of big pile-ups due to contamination as so:
       cd chrom
       foreach i (*.psl)
           echo $i
           mv $i xxx
           pslUnpile -maxPile=600 xxx $i
       rm xxx
       end
o - Remove long redundant bits from read names by making a file
    called subs.in with the following line:
        gnl|ti^ti
        contig_^tig_
    and running the commands
        cd ~/mouse/vsOo33/blatMouse.phusion/chrom
	subs -e -c ^ *.psl > /dev/null
o - Copy over to network where database is:
        ssh kks00
	cd ~/mm/bed
	mkdir blatMouse
	mkdir blatMouse/ph.chrom600
	cd !$
        cp /cluster/store2/mm.2001.11/mm1/blatMouse.phusion/chrom/*.psl .
o - Rename to correspond with tables as so and load into database:
       ssh hgwdev
       cd ~/mm/bed/blatMouse/ph.chrom600
       foreach i (*.psl)
	   set r = $i:r
           mv $i ${r}_blatMouse.psl
       end
       hgLoadPsl mm1 *.psl
o - load sequence into database as so:
	ssh kks00
	faSplit about /projects/hg3/mouse/arachne.3/whole/Unplaced.mfa 1200000000 /projects/hg3/mouse/arachne.3/whole/unplaced
	ssh hgwdev
	hgLoadRna addSeq '-abbr=gnl|' mm1 /projects/hg3/mouse/arachne.3/whole/unpla*.fa
	hgLoadRna addSeq '-abbr=con' mm1 /projects/hg3/mouse/arachne.3/whole/SET*.mfa
    This will take quite some time.  Perhaps an hour .

o - Produce 'best in genome' filtered version:
        ssh kks00
	cd ~/mouse/vsOo33
	pslSort dirs blatMouseAll.psl temp blatMouse
	pslReps blatMouseAll.psl bestMouseAll.psl /dev/null -singleHit -minCover=0.3 -minIdentity=0.1
	pslSortAcc nohead bestMouse temp bestMouseAll.psl
	cd bestMouse
        foreach i (*.psl)
	   set r = $i:r
           mv $i ${r}_bestMouse.psl
        end
o - Load best in genome into database as so:
	ssh hgwdev
	cd ~/mouse/vsOo33/bestMouse
        hgLoadPsl mm1 *.psl

PRODUCING CROSS_SPECIES mRNA ALIGMENTS (todo)

Here you align vertebrate mRNAs against the masked genome on the
cluster you set up during the previous step.

o - Make sure that gbpri, gbmam, gbrod, and gbvert are downloaded from Genbank into
    /cluster/store1/genbank.127

o - Process these out of genbank flat files as so:
       ssh kkstore
       cd /cluster/store1/genbank.127
       mkdir ../mrna.127
       gunzip -c gbpri*.gz gbmam*.gz gbrod*.gz gbvrt*.gz gbinv*.gz | gbToFaRa ~/hg/h/xenoRna.fil ../mrna.127/xenoRna.fa ../mrna.127/xenoRna.ra ../mrna.127/xenoRna.ta stdin
       cd ../mrna.127
       faSplit sequence xenoRna.fa 2 xenoRna
       ssh kks00
       cd /projects/hg2
       mkdir mrna.127
       cp /cluster/store1/mrna.127/xenoRna.* mrna.127

Set up cluster run.  First make sure genome is in kks00:/scratch/hg/gs.11/build28/contigTrf
in RepeatMasked + trf form.  (This is probably done already in mouse alignment
stage).  Also make sure /scratch/hg/mrna.127 is loaded with xenoRna.fa Then do:
       ssh kkstore
       cd /cluster/store2/mm.2001.11/mm1/mm/bed
       mkdir xenoMrna
       cd xenoMrna
       mkdir psl
       ls -1S /scratch/hg/gs.11/build28/trfFa > human.lst
       ls -1S /scratch/hg/mrna.127/xenoRna?*.fa > mrna.lst
       cp ~/lastOo/bed/xenoMrna/gsub .
       gensub2 human.lst mrna.lst gsub spec
       jabba make hut spec
       jabba push hut
Do jabba check hut until the run is done, doing jabba push hut if
necessary on occassion.

Sort xeno mRNA alignments as so:
       ssh kkstore
       cd ~/mm/bed/xenoMrna
       pslSort dirs raw.psl /cluster/store2/temp psl
       pslReps raw.psl cooked.psl /dev/null -minAli=0.25
       liftUp chrom.psl ../../jkStuff/liftAll.lft warn cooked.psl
       pslSortAcc nohead chrom /cluster/store2/temp chrom.psl
       pslCat -dir chrom > xenoMrna.psl
       rm -r chrom raw.psl cooked.psl chrom.psl

Load into database as so:
       ssh hgwdev
       cd ~/mm/bed/xenoMrna
       hgLoadPsl mm1 xenoMrna.psl -tNameIx
       cd /cluster/store1/mrna.127
       hgLoadRna add mm1 /cluster/store1/mrna.127/xenoRna.fa xenoRna.ra

Similarly do xenoEst aligments:
       ssh kkstore
       cd /cluster/store2/mm.2001.11/mm1/mm/bed
       mkdir xenoEst
       cd xenoEst
       mkdir psl
       ls -1S /scratch/hg/gs.11/build28/trfFa/*.fa > human.lst
       ls -1S /scratch/hg/mrna.127/xenoEst?*.fa > mrna.lst
       cp ~/lastOo/bed/xenoEst/gsub .
       gensub2 human.lst mrna.lst gsub spec
       jabba make hut spec
       jabba shove hut

Sort xenoEst alignments:
       ssh kkstore
       cd ~/mm/bed/xenoEst
       pslSort dirs raw.psl /cluster/store2/temp psl
       pslReps raw.psl cooked.psl /dev/null -minAli=0.10
       liftUp chrom.psl ../../jkStuff/liftAll.lft warn cooked.psl
       pslSortAcc nohead chrom /cluster/store2/temp chrom.psl
       pslCat -dir chrom > xenoEst.psl
       rm -r chrom raw.psl cooked.psl chrom.psl

Load into database as so:
       ssh hgwdev
       cd ~/mm/bed/xenoEst
       hgLoadPsl mm1 xenoEst.psl -tNameIx
       cd /cluster/store1/mrna.127
       foreach i (xenoEst*.fa)
	   echo processing $i
	   hgLoadRna add mm1 /cluster/store1/mrna.127/$i $i:r.ra
       end


PRODUCING FISH ALIGNMENTS (in progress)

o - Download sequence from ... and put it on the cluster local disk
    at
       /scratch/hg/fish
o - Do fish/mouse alignments.
       ssh kk
       cd ~/mm/bed
       mkdir blatFish
       cd blatFish
       mkdir psl
       ls -1S /scratch/hg/fish > fish.lst
       ls -1S /scratch/hg/mm1/trfFa > mouse.lst
     Copy over gsub from previous version and edit paths to
     point to current assembly.
       gensub2 mouse.lst fish.lst gsub spec
       para create spec
       para try
     Make sure jobs are going ok with para check.  Then
       para push
     wait about 2 hours and do another
       para push
     do para checks and if necessary para pushes until done
     or use para shove.
o - Sort alignments as so 
       pslCat -dir psl | liftUp -type=.psl stdout ~/mm/jkStuff/liftAll.lft warn stdin | pslSortAcc nohead chrom temp stdin
o - Copy to hgwdev:/scratch.  Rename to correspond with tables as so and 
    load into database:
       ssh hgwdev
       cd ~/mm/bed/blatFish/chrom
       foreach i (*.psl)
	   set r = $i:r
           mv $i ${r}_blatFish.psl
       end
       hgLoadPsl mm1 *.psl



TIGR GENE INDEX (todo)

  o - Download ftp://ftp.tigr.org/private/HGI_ren/*aug.tgz into
      ~/oo.29/bed/tgi and then execute the following commands:
          cd ~/oo.29/bed/tgi
	  mv cattleTCs_aug.tgz cowTCs_aug.tgz
	  foreach i (mouse cow human pig rat)
	      mkdir $i
	      cd $i
	      gtar -zxf ../${i}*.tgz
	      gawk -v animal=$i -f ../filter.awk * > ../$i.gff
	      cd ..
	  end
	  mv human.gff human.bak
	  sed s/THCs/TCs/ human.bak > human.gff
	  ldHgGene -exon=TCs hg7 tigrGeneIndex *.gff


      
LOAD STS MAP (todo)
     - login to hgwdev
      cd ~/mm/bed
      mm1 < ~/src/hg/lib/stsMap.sql
      mkdir stsMap
      cd stsMap
      bedSort /projects/cc/hg/mapplots/data/tracks/build28/stsMap.bed stsMap.bed
      - Enter database with "mm1" command.
      - At mysql> prompt type in:
          load data local infile 'stsMap.bed' into table stsMap;
      - At mysql> prompt type
          quit


LOAD CHROMOSOME BANDS (todo)
      - login to hgwdev
      cd /cluster/store2/mm.2001.11/mm1/bed
      mkdir cytoBands
      cp /projects/cc/hg/mapplots/data/tracks/build28/cytobands.bed cytoBands
      mm1 < ~/src/hg/lib/cytoBand.sql
      Enter database with "mm1" command.
      - At mysql> prompt type in:
          load data local infile 'cytobands.bed' into table cytoBand;
      - At mysql> prompt type
          quit

LOAD MOUSEREF TRACK (todo)
    First copy in data from kkstore to ~/mm/bed/mouseRef.  
    Then substitute 'genome' for the appropriate chromosome 
    in each of the alignment files.  Finally do:
       hgRefAlign webb mm1 mouseRef *.alignments

LOAD AVID MOUSE TRACK (todo)
      ssh cc98
      cd ~/mm/bed
      mkdir avidMouse
      cd avidMouse
      wget http://pipeline.lbl.gov/tableCS-LBNL.txt
      hgAvidShortBed *.txt avidRepeat.bed avidUnique.bed
      hgLoadBed avidRepeat avidRepeat.bed
      hgLoadBed avidUnique avidUnique.bed


LOAD SNPS (todo)
      - ssh hgwdev
      - cd ~/mm/bed
      - mkdir snp
      - cd snp
      - Download SNPs from ftp://ftp.ncbi.nlm.nih.gov/pub/sherry/gp.oo33.gz
      - Unpack.
        grep RANDOM gp.oo33 > snpTsc.txt
        grep MIXED  gp.oo33 >> snpTsc.txt
        grep BAC_OVERLAP  gp.oo33 > snpNih.txt
        grep OTHER  gp.oo33 >> snpNih.txt
        awk -f filter.awk snpTsc.txt > snpTsc.contig.bed
        awk -f filter.awk snpNih.txt > snpNih.contig.bed
        liftUp snpTsc.bed ../../jkStuff/liftAll.lft warn snpTsc.contig.bed
        liftUp snpNih.bed ../../jkStuff/liftAll.lft warn snpNih.contig.bed
	hgLoadBed mm1 snpTsc snpTsc.bed
	hgLoadBed mm1 snpNih snpNih.bed

LOAD CPGISSLANDS (todo)
     - login to hgwdev
     - cd /cluster/store2/mm.2001.11/mm1/bed
     - mkdir cpgIsland
     - cd cpgIsland
     - mm1 < ~kent/src/hg/lib/cpgIsland.sql
     - wget http://genome.wustl.edu:8021/pub/gsc1/achinwal/MapAccessions/cpg_aug.oo33.tar
     - tar -xf cpg*.tar
     - awk -f filter.awk */ctg*/*.cpg > contig.bed
     - liftUp cpgIsland.bed ../../jkStuff/liftAll.lft warn contig.bed
     - Enter database with "mm1" command.
     - At mysql> prompt type in:
          load data local infile 'cpgIsland.bed' into table cpgIsland

LOAD ENSEMBLE GENES (todo)
     cd ~/mm/bed
     mkdir ensembl
     cd ensembl
     wget http://www.sanger.ac.uk/~birney/all_april_ctg.gtf.gz
     gunzip all*.gz
     liftUp ensembl110.gtf ~/mm/jkStuff/liftAll.lft warn all*.gtf
     ldHgGene mm1 ensGene en*.gtf
o - Load Ensembl peptides:
     (poke around ensembl to get their peptide files as ensembl.pep)
     (substitute ENST for ENSP in ensemble.pep with subs)
     wget ftp://ftp.ensembl.org/pub/current/data/fasta/pep/ensembl.pep.gz
     gunzip ensembl.pep.gz
   edit subs.in to read: ENSP|ENST
     subs -e ensembl.pep
     hgPepPred mm1 generic ensPep ensembl.pep

LOAD SANGER22 GENES (todo)
      - cd ~/mm/bed
      - mkdir sanger22
      - cd sanger22
      - not sure where these files were downloaded from
      - grep -v Pseudogene Chr22*.genes.gff | hgSanger22 mm1 stdin Chr22*.cds.gff *.genes.dna *.cds.pep 0
          | ldHgGene mm1 sanger22pseudo stdin
         - Note: this creates sanger22extras, but doesn't currently create
           a correct sanger22 table, which are replaced in the next steps
      - sanger22-gff-doctor Chr22.3.1x.cds.gff Chr22.3.1x.genes.gff \
          | ldHgGene mm1 sanger22 stdin
      - sanger22-gff-doctor -pseudogenes Chr22.3.1x.cds.gff Chr22.3.1x.genes.gff \
          | ldHgGene mm1 sanger22pseudo stdin

      - hgPepPred mm1 generic sanger22pep *.pep

LOAD SANGER 20 GENES (todo)
     First download files from James Gilbert's email to ~/mm/bed/sanger20 and
     go to that directory while logged onto hgwdev.  Then:
        grep -v Pseudogene chr_20*.gtf | ldHgGene mm1 sanger20 stdin
	hgSanger20 mm1 *.gtf *.info


LOAD RNAGENES (todo)
      - login to hgwdev
      - cd ~kent/src/hg/lib
      - mm1 < rnaGene.sql
      - cd /cluster/store2/mm.2001.11/mm1/bed
      - mkdir rnaGene
      - cd rnaGene
      - download data from ftp.genetics.wustl.edu/pub/eddy/pickup/ncrna-oo27.gff.gz
      - gunzip *.gz
      - liftUp chrom.gff ../../jkStuff/liftAll.lft carry ncrna-oo27.gff
      - hgRnaGenes mm1 chrom.gff

LOAD EXOFISH (todo)
     - login to hgwdev
     - cd /cluster/store2/mm.2001.11/mm1/bed
     - mkdir exoFish
     - cd exoFish
     - mm1 < ~kent/src/hg/lib/exoFish.sql
     - Put email attatchment from Olivier Jaillon (ojaaillon@genoscope.cns.fr)
       into /cluster/store2/mm.2001.11/mm1/bed/exoFish/all_maping_ecore
     - awk -f filter.awk all_maping_ecore > exoFish.bed
     - hgLoadBed mm1 exoFish exoFish.bed

LOAD MOUSE SYNTENY (todo)
     - login to hgwdev.
     - cd ~/kent/src/hg/lib
     - mm1 < mouseSyn.sql
     - mkdir ~/mm/bed/mouseSyn
     - cd ~/oo/bed/mouseSyn
     - Put Dianna Church's (church@ncbi.nlm.nih.gov) email attatchment as
       mouseSyn.txt
     - awk -f format.awk *.txt > mouseSyn.bed
     - delete first line of mouseSyn.bed
     - Enter database with "mm1" command.
     - At mysql> prompt type in:
          load data local infile 'mouseSyn.bed' into table mouseSyn


LOAD GENIE (todo)
     - cat */ctg*/ctg*.affymetrix.gtf > predContigs.gtf
     - liftUp predChrom.gtf ../../jkStuff/liftAll.lft warn predContigs.gtf
     - ldHgGene mm1 genieAlt predChrom.gtf

     - cat */ctg*/ctg*.affymetrix.aa > pred.aa
     - hgPepPred mm1 genie pred.aa 

     - mm1
         mysql> delete * from genieAlt where name like 'RS.%';
         mysql> delete * from genieAlt where name like 'C.%';

LOAD SOFTBERRY GENES (done)
     - ln -s /cluster/store2/mm.2001.11/mm1 ~/mm
     - cd ~/mm/bed
     - mkdir softberry
     - cd softberry
     - get ftp://www.softberry.com/pub/SC_MOU_NOV01/softb_mou_genes_nov01.tar.gz
     - mkdir output
     - Run the fixFormat.pl routine in
        ~/mm/bed/softberry like so:
        ./fixFormat.pl output
     - This will stick all the properly converted .gff and .pro files in
     the directory named "output".
     - cd output
     ldHgGene mm1 softberryGene chr*.gff
     hgPepPred mm1 softberry *.pro
     hgSoftberryHom mm1 *.pro

LOAD ACEMBLY (todo)
    - Get acembly*gene.gff from Jean and Michelle Thierry-Miegs and
      place in ~/mm/bed/acembly
    - Replace c_chr with chr in acembly*.gff
    - Replace G_t1_chr with chr and likewise
      G_t2_chr with chr, etc.
    - cd ~/mm/bed/acembly
    - # The step directly below is not necessary since the files were already lifted
      #  liftUp ./aceChrom.gff /cluster/store2/mm.2001.11/mm1/jkStuff/liftHs.lft warn acemblygenes*.gff
    - Use /cluster/store2/mm.2001.11/mm1/mattStuff/filterFiles.pl to prepend "chr" to the
    start of every line in the gene.gff files and to concat them into the aceChrom.gff
    gile. Read the script to see what it does. It's tiny and simple.
    - Concatenate all the protein.fasta files into a single acembly.pep file
    - Load into database as so:
        ldHgGene mm1 acembly aceChrom.gff
        hgPepPred mm1 generic acemblyPep acembly.pep

LOAD GENOMIC DUPES (todo)
o - Load genomic dupes
    ssh hgwdev
    cd ~/mm/bed
    mkdir genomicDups
    cd genomicDups
    wget http://codon/jab/web/takeoff/oo33_dups_for_kent.zip
    unzip *.zip
    awk -f filter.awk oo33_dups_for_kent > genomicDups.bed
    mysql -u hgcat -pbigSECRET mm1 < ~/src/hg/lib/genomicDups.sql
    hgLoadBed mm1 -oldTable genomicDups genomicDupes.bed

FAKING DATA FROM PREVIOUS VERSION
(This is just for until proper track arrives.  Rescues about
97% of data  Just an experiment, not really followed through on).

o - Rescuing STS track:
     - log onto hgwdev
     - mkdir ~/mm/rescue
     - cd !$
     - mkdir sts
     - cd sts
     - bedDown hg3 mapGenethon sts.fa sts.tab
     - echo ~/mm/sts.fa > fa.lst
     - pslOoJobs ~/mm ~/mm/rescue/sts/fa.lst ~/mm/rescue/sts g2g
     - log onto cc01
     - cc ~/mm/rescue/sts
     - split all.con into 3 parts and condor_submit each part
     - wait for assembly to finish
     - cd psl
     - mkdir all
     - ln ?/*.psl ??/*.psl *.psl all
     - pslSort dirs raw.psl temp all
     - pslReps raw.psl contig.psl /dev/null
     - rm raw.psl
     - liftUp chrom.psl ../../../jkStuff/liftAll.lft carry contig.psl
     - rm contig.psl
     - mv chrom.psl ../convert.psl


