New York, New York, United States


I don't know why my linux suddenly won't accept xwindows connections from umdnj
fixed by using this command.
/usr/X11R6/bin/xhost remote_hostname_or_ip_address

also http://wiredx.net doesn't work, also Exceed doesn't work.
but didn't fix yet.

maybe because of quota exceeded.


the photoshop effect I applied on my BMW photo,
Stylize-Find Edges + Artistic-Cutout + ...


Net-Msmgr a MSN Perl Module

here has valuable info about msn, including lib, bot, client, protocol...

try talk to bot2k3@hotmail.com using MSN messenger!
it has several functions.

to use sybyl do 3d qsar on nci data:
put compounds in database,
change compounds' name to something doesn't content special char.
use "align database" to align compounds,
leave a core structure in molecular area will help you do the align.

search nci data and display in mdl mol format, then the core structure info can be get from the html source code, writing in MDL script.

you can use this command to get mol files.
wget -r --accept=mol

you can use link like this to get GI50 data of NSC94600

sybyl servers in umdnj
achilles.umdnj.edu are pretty fast due to users usually use apollo and titan.

MDL mol file format works fine on NCI, so I planed to download MDL mol file then convert to sybyl mol2 file format. (maybe use openbabel)
tested by using babel to convert it. it works this way.
I don't know why files generated by nci can't be read by sybyl, ex: pdb not work, sybyl mol2 not work. mol not work, should be convert by using babel.

when babel don't work, put the files in a dos8 directory, ex: use d:\temp instead of d:\program files\...

now, the only thing left is, how can I align those compounds by the core substructure automatically.


It's a long time I didn't touch MAYA (a 3D modeling software for Computer Graphic)
now it has amazing features.
here are online tutorials via Streaming Videos


about MSN messenger:
Java MSN real status viewer (I can't get it work)
Triople can detect if a msn user is online of not. (web interface)
and you can even send message to an user who already blocked you by changing your nick name to a magic string.
(a special software to change nick name is required)

this "Online Status Indicator server" can let users know if someone is online or not.
but can't detect real status.


3D Minder cluster the anti-cancer compounds for you (need specific browser and java version +java3d)
Self-Organized Map cluster the anti-hiv compounds for you.

I think I should use -log(GI50) as the anti-cancer activity value for QSAR.

I think I should use log(1/EC50)=?-log(EC50) as the anti-hiv activity value for QSAR.


3D Mind will cluster your compounds for you.
and it can also show you what else are in that cluster.

go into 3D Mind tools and use node 23+18 (cluster 16.8 + 17.9)
or node 13+11+7+3+...

or you can search by SMILE string: "C1CC3=C(CO1)CN2CC5=C(C2=C3)N=C4C=CC=CC4=C5" here

here's a paper uses NCI's anticancer database to do QSAR

found a pretty good entry to find cancer screen data from nci
for aids screen data get compouds here then get activity here or see structures here

what is GI50 TGI LC50 in nci database

you can use online tools: NCI 3DMiner or Enhanced NCI Database Browser to search compounds list here, then you can easily compare those compounds.

after you saw some substructure shared by all the structures above,
and got more structures by substructure search.
you can use this page to see there activies.

Ward in JKlustor can work with GenerateMD then you can Clustering using Pharmacophore Fingerprints.
Ref: example section of this page


you can use Jarp and Ward in JKlustor
to cluster the compounds based on NCI's data
ref: http://www.chemaxon.com/conf/Eurocombi_poster_Ltr.pdf

there's a freeware like ghost
called g4u (ghost for unix)
works in a floppy disk and can upload harddisk image to a ftp server. and dump to another computer's harddisk.

I think I can find a drug for AIDS here.(example)

and it will gives me several similar compounds and CAS#

then I can use those CAS# to search activity data here

for Cancer, use these Drugs list1 Drug List 2 and Acitivity

PS: if interested in anti-cancer plants, you can find it and compounts it produced here or here.


Oops, I WAS planing to do an automated QSAR program for binf7592
but after I read Ch.5.3.3 in the paper "Structure Database" teacher gave us on 10/20.
I noticed there are several tools exist. "Catalyst" and "APEX"
details about catalyst and apex

drug activity data can be obtained from here for binf7592 project

other activity database for future job can be obtained here

we know SMILES is not Unique.
but it's very easy to Translate to unique smiles.


Grail has socket interface
ref original
detail (official)

use meme to find motifs
./meme ../../markfilter/shrtincor.fa -protein -mod zoops -nmotifs 20 -minsites 2 -maxsites 5 -minw 3 -maxw 50 -evt 10000 -time 7200 -maxsize 60000 -nostatus -maxiter 25 > output.html

use meta-meme to make hmm
./mhmm -meme ../../../meme.3.0.4/bin/output.html > test.mhmm

test sequences using that hmm
./mhmmscan -hmm test.mhmm -seq ../../../markfilter/shrtcor.fa 2>/dev/null | more

META-MEME seems good for my HMM engine,
but after some research, maybe MEME will sute me more.
it can find motifs for me in the sequences.
and of course, if you want HMM, you are one button away!

seems I can use it on super computer, but still slow...


I think I might use HMMER or SAM or HTK or GHMM as my HMM engine.
and train the model by using both positive exon-intron group and nagtive exon-intron group.

web interface for:
convert between HMMER and SAM.

using netcat to scan your computer ports:
nc -v -z 1-9999
will scan port 1~port 9999 for you without delay.


Cambridge is pretty easy to use
short tips:
on octane-2,
just cd to "/scratch/molmod/cambridge/cambridge/bin"
and type "cq"
then, you got it!

pretty easy to use, but not much records in database yet.

quota on /research only 5MB.

you can use this program to see if the association is come from HSSEX or really from HFE7*HSSEX;

proc glm data=adult;
where HAD1<3;
class HFE7 HSSEX;
lsmeans HFE7 HSSEX HFE7*HSSEX/pdiff;

at the end of following page, you'll see how to use SAS to analyze multiple variables.

ex: I know HFE7 associat with HAC1E, but will it be different between man and woman(HSSEX)?

After 4 days trying, I finally know how to use SAS to figure out which variable is associate with specific variable.

you can use following SAS program to figure out which variable between HAC1A~HAC1O
are associate with HFE7;
proc glm data=adult;

after you find something's P<0.05 (ex:HAC1E)

you can use following SAS program to see
what's the different between group 1 and 2 in HFE7
proc glm data=adult;
class HFE7;
model HAC1E=HFE7;
lsmeans HFE7;

this helps


For scan your network holes:

Leak Testers:

LeakTest - http://grc.com/lt/leaktest.htm
TooLeaky - http://tooleaky.zensoft.com
Firehole - http://keir.net/firehole.html
Yalta - http://www.soft4ever.com/security_test/En/index.htm

Online Port Scanners:

Sygate – Option in software menu.
Symantec - http://www.symantec.com/cgi-bin/securitycheck.cgi
PC Flank - http://www.pcflank.com/index.htm


when searching on pubmed, don't use ":" in query.
if the title of paper has ":" in it, remove it when searching.
or you'll never get result.

Augustus got a good thing about start_codon and stop_codon
and can detect short exons.
and can download and run locally.


I previously misunderstood this site (http://www.fruitfly.org/seq_tools/splice.html)
it shows the donor site's start point and end point but not the cut position.
after this understanding, it can be used in binf7600's project.

I think I can find Alternative Splicing Sequences and data from ProSplicer,
to prove my model,
then train my model.

today NetGene2 not work, so I try to find others.

maybe try if "exonic splicing enhancers predictor" can help improve ours or not.
ref: http://nar.oupjournals.org/cgi/reprint/31/13/3568.pdf

this site do the same thing as me, connect to other sites to obtain interstat values.
ref: http://www.inra.fr/bia/T/schiex/Export/LNCS-EuGene.pdf

this software is a A Generic Framework for the Integration of Gene-Prediction Data
ref: http://www.genome.org/cgi/reprint/12/9/1418.pdf


here even has a database for my whole theory.
this one looks better

you can see when it cut in different places, and didn't get correct, what will happen.

figure 4 in this paper proved my third theory(10/8/03).

I found a scary thing in my special Linux cd,
although I know that CD is a very powerful cd,
I don't know it's that scary...

it not only has ethereal, tcpdump,
it also has "ettercap"!
a tool which can hack even in switch env, ssh, OS fingerprint, disconnect connections, type charactors for others!

here's a very very good introduction of this kind of tech and tools. and ways to prevent!


I got a material totally support my theory(10/8/03), and they also got an interesting paper talking about differences in RNA splicing between normal cell and aged cell.

I think this group are going the same direction as me.

You can use Blogger API to extract your blog from blogger.com in XML format
tested using WSDL and SoapClient web interface


I found part of the answer why hnRNP needed in RNA splicing (to cut intro exon).
Because hnRNP A2 on RNA will use kinesin on microtubule to move it self. (supported by this paper)
and this paper also support my theory
so it can move two splicesome together (my theory).

this paper support my another theory, "a novel class of genes which, although they encode polyadenylated RNA, might not make a translated protein."

this abstract also support my third theory, when enzyme see the sequence, they don't know if they cut the right place or not, they just cut it. There's other place to check if they cut right or not.

and I'll use these three theory in my binf7600 project.

and this one should be able to help me to detect hnRNP binding site.

(I found this on 10/10/03, which draw exactly as I thought! and support all my theory!)

bioinfo tutorial in flash animation (chinese)

you can search similar structures or substructures using
and AIDS, Cancer activity also. (not working on 10/8/03)

Since I can't use SAS to do analyze at PC Lab, due to Go Back and hard disk limit in PC Lab, I decide to do it at home.
Since I don't have SAS and it cost $50 for student version and only work for 3 months, I decide to use something free.
So I found OpenStat, a freeware which can do lots of statistics analysis.
It has both windows and Linux version.
Now I use SAS at school to translate data to CVS format, but I think I should write a translator to do that for rapidly use.


for binf7592
I like ChemSketch, cause it can check the structure for me easily, both in 2D and 3D.
but if I need to create something like Markush Structure, I think I'll use ChemSketch first then use ISIS Draw to make it like Markush Structure.

after one day testing and researching,
I found the answer of my problems,

VMWare's NIC doesn't support PXE.
ClusterKnoppix 9/5/03 version has "terminalserver + etherboot" bug
should be fixed in 9/24/03 version

Windows on CD, Linux on CD (intro in chinese)
knoppix pxe, how to configure debian manually, how to make boot disk. (in chinese)
Diskless Remote Boot in Linux (DRBL) for Redhat 8.0 (tutorial in chinese)
knoppix pxe faq

bochs a virtual pc software like vmware but it's opensource.


Bot for MSN chat room
how to use response.txt?
Remember to restart the ViperBot after make changes as discribed above.
Tested in W2K english version, works fine.
But failed in Chinese version.

What is Markush Structure? (Chinese)
How to search pharmaceutical compounds in patent database using Markush Structure search.

more info


How to make an NCBI Blast at home on clusters with OpenMosix support?

Booting Windows From CD-ROM (windows on cd)


today, I saw something that I want to create in the past.
a cluster system which requires no installation, no harddisk.
only a cdrom pop into PCs, then it works as one clustered system.
so it can easily work in pc labs in school. :p

It called clusterknoppix

and it's better!
client nodes doesn't even need CDROM! (when boot from PXE chips [rom])
or a floppy disk


I was thinking about a tool can save my informations in nested tree structure.
I think this is the answer.
which use xml and flash to do it. and can easily add, delete, nodes with title, link. and can change style easily.
and can save and open xml files.


I'm trying to test exon intron prediction software,
when tested in GeneMark, I got a bad result.
then I found NetGene2 on google.
and have a very good result, and useful information (ex:calculations).
and also has paper discribing it's algorithm.

maybe I can get values from NetGene2 and do some improvement.
they also has tool, GenePublisher, to let you input microarray data, and get microarray analysis report. Just like what I did for my uncle.
in BINF5230, Fall 2002, we've use Grail to predict exon & intron.
now, it has Grail-1.3
in "Grail 2 Exons", you can check "Clusters", and you'll have almost all the possible position where correct exon intron will be.

maybe I can get all the possible position from Grail and then check each point using my own algorithm.


NCBI現在有個叫OMIM的database (可得到疾病相關gene, protein之資訊)
你在NCBI查到gene之後 按他右邊的links 如果有OMIM的話 代表他是已知疾病相關的gene
按過去之後如果右邊有GeneTests 代表可能有laboratories that offer clinical testing for this disorders.



如何知道microsoft & yahoo等大公司內部資料
如: 他們的server可能架構, 他們的程式可能的演算法





how to get biomedical datasets


Q:use script to do telnet stuff
A:two ways,
1.Use "expect" command example
2.Use perl's Net::Telnet example additional info


  • vmware GSX
  • vmware ESX