From: Randolph Wang <rywang@CS.Princeton.EDU>
Date: Mon, 19 Apr 2004 07:43:07 -0400 (EDT)
To: rgottron@princeton.edu
Cc: randy_class@CS.Princeton.EDU
Subject: program question

>  Hi, 
>    
>  I've attached my EditDistance program.  It works for the example10.txt 
>  (so I know that the code is debugged and everything), but I was 
>  wondering if you would take a look at it and perhaps just give me a hint 
>  to how I could significantly cut down the running time.  It takes about 
>  10 seconds for ecoli2500.txt, 50 seconds for ecoli3000.txt, and at least 
>  7 minutes for 3coli5000.txt (I haven't tried any more).  My computer has 
>  128 MB main memory.  Is this program supposed to be shorter?  I feel 
>  like there aren't that many extra steps, so I don't know what's going so 
>  slowly (unless the program is supposed to go this slowly).  Thanks! 
>    
>  Rachel 

This is due to something that you're not expected to know, but let me
explain quickly anyhow.

If you declare a 2-d array (or matrix), 

     int [][] a;

     ...

there could be two possible ways of dealing with it:

     for (int i = ...) {
         for (int j = ...) {
             process a[i][j]
         }
     }

or:

     for (int j = ...) {
         for (int i = ...) {
             process a[i][j]
         }
     }

In the first way, you deal with the thing one row at a time.  In the
second way, you deal with the thing one column at a time.  It just so
happens that the way Java lays out the data in memory makes it much
faster to deal with the matrix using the first way: namely, dealing
with the matrix one row at a time.  In CS jargon, this is called
row-major processing.

If you could rewrite your code slightly to do row-major processing,
you'll find that your code would run faster.

And of course, don't forget to use java flags to tell it how much
memory to use as instructed by the checklist:

    java -Xmx100m EditDistance < input.txt

Randy


program question / Randolph Wang