--- /dev/null
+++ b/README
@@ -0,0 +1,148 @@
+KlustaKwik version 2.01
+----------------------
+
+KlustaKwik is a program for unsupervised classification of multidimensional
+continuous data. It arose from a specific need - automatic sorting of neuronal
+action potential waveforms (see KD Harris et al, Journal of Neurophysiology
+84:401-414,2000), but works for any type of data.  We needed a program that
+would:
+
+1) Fit a mixture of Gaussians with unconstrained covariance matrices
+2) Automatically choose the number of mixture components
+3) Be robust against noise
+4) Reduce the problem of local minima
+5) Run fast on large data sets (up to 100000 points, 48 dimensions)
+
+Speed in particular was essential.  KlustaKwik is based on the CEM algorithm of
+Celeux and Govaert (which is faster than the standard EM algorithm), and also
+uses several tricks to improve execution speed while maintaining good
+performance.  On our data, it runs at least 10 times faster than Autoclass.
+
+Cluster splitting and deletion
+------------------------------
+
+The main improvement in version 1.5 is a cluster splitting feature.  KlustaKwik
+allows for a variable number of clusters to be fit, penalized by AIC. The
+program periodically checks if splitting any cluster would improve the overall
+score.  It also checks to see if deleting any cluster and reallocating its
+points would improve overall score.  The splitting and deletion features allow
+the program to often escape from local minima, reducing sensitivity to the
+initial number of clusters, and reducing the total number of starts needed for
+a data set.
+
+
+Compilation
+-----------
+
+The program is written in C++.  To compile under unix, extract all files to a
+single directory and type make.  That should be all you need to do.  If it
+doesn't work, change the makefile to replace g++ with the name of your C++
+compiler.
+
+To check it compiled properly type "KlustaKwik test 1 -MinClusters 2" to run
+the program on the supplied test file.
+
+Usage
+-----
+
+The program takes a "feature file" as input, and produces two output files, the
+"cluster file", and a log file.  The file formats and conventions may seem
+slightly strange.  This is for historical reasons.  If you want to change the
+code, go ahead, this is open source software.
+
+The feature file should have a name like FILE.fet.n, where FILE is any string,
+and n is a number.  The program is invoked by running "KlustaKwik FILE n", and
+will create a cluster file FILE.clu.n and a log file FILE.klg.n.  The number n
+doesn't serve any purpose other than to let you have several files with the same
+file base.
+
+The first line of the feature file should be the number of input dimensions. 
+The following lines are the data, with each line being one data instance,
+consisting of a list of numbers separated by spaces.  An example file test.fet.1
+is provided.
+
+The first line of the cluster file will be the number of classes that the
+program chose.  The following lines will be the classes asigned to the data
+points.  Class 1 is a "noise cluster" modelled by a uniform distribution, which
+should contain outliers, if there are any.
+
+
+Parameters
+----------
+
+It is possible to pass the program parameters by running "KlustaKwik FILE n
+params" etc.  All parameters have default values. Here are the parameters you can
+use:
+
+-help
+Prints a short message and then the default parameter values.
+
+-MinClusters n   (default 20)
+The random intial assignment will have no less than n clusters.  The final
+number may be different, since clusters can be split or deleted during the
+course of the algorithm
+
+-MaxClusters n   (default 30)
+The random intial assignment will have no more than n clusters. 
+
+-nStarts n       (default 1)
+The algorithm will be started n times for each inital cluster count between
+MinClusters and MaxClusters.
+
+-SplitEvery n    (default 50)
+Test to see if any clusters should be split every n steps. 0 means don't split.
+
+-MaxPossibleClusters n   (default 100)
+Cluster splitting can produce no more than n clusters.
+
+-RandomSeed n    (default 1)
+Specifies a seed for the random number generator
+
+-UseFeatures STRING   (default 11111111111100001)
+Specifies a subset of the input features to use.  STRING should consist of 1s
+and 0s with a 1 indicating to use the feature and a 0 to leave it out.  NB The
+default value for this parameter is 11111111111100001 (because this is what we
+use in the lab) - so if you have more than 12 dimensions you will need to change
+it.
+
+-StartCluFile STRING   (default "")
+Treats the specified cluster file as a "gold standard".  If it can't find a
+better cluster assignment, it will output this.
+
+-DistThresh d    (default 6.907755)
+Time-saving paramter.  If a point has log likelihood more than d worse for a
+given class than for the best class, the log likelihood for that class is not
+recalculated.  This saves an awful lot of time.
+
+-FullStepEvery n (default 10)
+All log-likelihoods are recalculated every n steps (see DistThresh)
+
+-ChangedThresh f (default 0.05)
+All log-likelihoods are recalculated if the fraction of instances changing class
+exeeds f (see DistThresh)
+
+-MaxIter n       (default 500)
+Don't try more than n iterations from any starting point.
+
+-Log             (default 1)
+
+Produces .klg log file (default is yes, to switch off do -Log 0)
+
+-Screen          (default 1)
+
+Produces parameters and progress information on the console. Set to 0 to suppress 
+output in batches.
+
+-Debug           (default 0)
+Miscellaneous debugging information (not recommended)
+
+-DistDump        (default 0)
+Outputs a ridiculous amount of debugging information (definately not recommended).
+
+
+Contact Information
+-------------------
+
+This program is copyright Ken Harris (harris@axon.rutgers.edu), 2000-2002. It
+is distributed under the GNU General Public License (www.gnu.org).  If you make
+any changes or improvements, please let me know.
--- /dev/null
+++ b/test.fet.1
@@ -0,0 +1,202 @@
+2
+-4326	-1834
+-2437	-3718
+-3642	-2409
+-2392	-3417
+-2483	-3470
+-1751	-4523
+-4094	-1892
+-3774	-2010
+-2635	-3306
+-4117	-1770
+-3669	-2095
+-3085	-2993
+-3290	-2744
+-2238	-3799
+-3704	-2294
+-2491	-3533
+-3597	-2386
+-3966	-1797
+-1339	-4910
+-3095	-3061
+-3162	-2953
+-3620	-2456
+-3407	-2760
+-1948	-4340
+-3721	-2314
+-3898	-2204
+-3407	-2588
+-4588	-1501
+-2688	-3253
+-3666	-2507
+-761	-5480
+-2184	-3818
+-4029	-1732
+-995	-5200
+-2979	-3036
+-3643	-2197
+-3755	-2309
+-2870	-2956
+-3072	-2963
+-2109	-3610
+-2920	-3521
+-2860	-3409
+-4234	-1824
+-3813	-2090
+-3447	-2357
+-1362	-4430
+-4773	-973
+-4041	-1688
+-3409	-2426
+-3256	-2679
+-3367	-2793
+-4368	-1488
+-503	-5354
+-1968	-4362
+-4979	-1032
+-3115	-2816
+-1196	-4717
+-2486	-3729
+-2642	-3450
+-2460	-3424
+-3120	-2823
+-3965	-2088
+-2232	-3793
+-665	-5335
+-4442	-1923
+-2697	-3232
+-2417	-3317
+-1995	-4416
+-2891	-3090
+-2306	-3696
+-890	-4959
+-2857	-3257
+-4396	-1656
+-4724	-1194
+-3795	-2216
+-2349	-3429
+-2352	-3380
+-2216	-3863
+-3392	-2511
+-4628	-1471
+-1961	-3789
+-2783	-3583
+-3486	-2652
+-2084	-3307
+-2361	-3520
+-3568	-2269
+-2428	-3390
+-2731	-3322
+-3008	-3067
+-5142	-808
+-4021	-1913
+-3600	-2423
+-1879	-4507
+-2902	-2907
+-2790	-3325
+-3749	-2061
+-4278	-1728
+-2407	-3597
+-2347	-3766
+-3671	-2647
+2918	4029
+3185	4445
+2637	3126
+3162	3983
+3033	4518
+2898	4001
+3118	3818
+2854	1385
+2818	4377
+2957	2300
+3094	3119
+3040	2649
+3079	2914
+3347	1310
+3232	2064
+3383	2916
+3527	3780
+2667	4199
+2860	2151
+2995	3161
+3298	4057
+3163	2902
+3318	3694
+3107	1996
+2853	2448
+2927	4169
+2954	1052
+2893	2598
+2939	3064
+2993	1013
+2792	3996
+3211	3226
+3076	3885
+2943	2748
+2928	3930
+2953	3012
+3039	1962
+3140	2110
+2991	3878
+2930	3650
+2873	3107
+2897	2983
+3107	2813
+3223	2366
+3246	4391
+2869	3684
+2706	1623
+3263	1425
+3007	3931
+3244	3060
+3142	2632
+3218	3530
+3058	1120
+2879	2784
+2285	3624
+2871	921
+2981	3129
+2725	2852
+2884	1657
+2891	1722
+3089	4797
+2984	1936
+3443	3679
+3165	3726
+2875	4545
+2865	2137
+3115	2169
+3012	1031
+3148	1722
+3142	2500
+2830	3383
+3084	3545
+3120	2423
+2765	2456
+2984	1631
+2981	3797
+2407	3704
+2885	3240
+3189	4081
+2653	3172
+2993	5084
+2940	3365
+2891	4528
+2677	4228
+3044	5899
+3124	426
+3213	1975
+2929	4583
+3164	4701
+3100	2755
+2951	-356
+3174	2629
+3129	4391
+2965	2793
+2527	4671
+3327	4180
+3187	2113
+3142	1422
+2904	3945
+2909	4102
+0	0
--- /dev/null
+++ b/test_res.clu.1
@@ -0,0 +1,202 @@
+3
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+2
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+3
+1
