Please see GitHub page.

Recent notes:

With Java GUI applications, there can be an issue with small font size. For example, the following command sequence can be used to upscale the font sizes in the Java Weka application: GDK_SCALE=2 java -Dswing.aatext=true -Dswing.plaf.metal.controlFont=Tahoma-plain-22 -Dswing.plaf.metal.userFont=Tahoma-plain-22 -jar /path/to/weka.jar.

Here is a note on modeling the spread of the virus.

An implementation of an O(n) integer array counting sort which returns sorted result and permutation re-indexing information.

Fast parallel sorting is availabe in Java (8 and up). Here is an example.

Automated algorithm and parameter selection in ML models is available via auto weka in Java and autosklearn in Python, relying on parameter sweep, coordinate descent, and Bayesian opt methods. The drawback is of course the increased runtime, but the upper bound can be passed as a parameter. Here is an example call sequence.

Here is how to plot a basic confusion matrix with R: library('caret'); confmat = confusionMatrix(predicted, actuals); library(vcd); mosaic(confmat$table);. And here is a code snippet in Python which shows how to get individual, per class classification measures using scikit functions.

To get column count in a csv file for every row (useful to check if csv was created correctly), you can use Perl as follows: perl -nle 's/".*?"//g;print s/,//g+1' fname.csv. To remove the last few columns of a csv file, you can use awk like this: awk 'BEGIN{FS=OFS=","}{for(i=0; i<6; i++) NF--; print}' sample.csv > sample2.csv. This is handy for large files. When processing text files, two useful commands are awk and sed. To extract a given row, one can use (awk 'FNR==2' file.txt). To display a given column of a csv file, one can use (awk -F "\"*,\"*" '{print $2}' file.csv). It's often useful to remove a subset of the rows of a file (or set of files). This can be efficiently accomplished with the sed command (e.g. sed --in-place '2,40d;' *txt will remove lines 2 to 40). Here is an example script to remove random rows with sed.

Simple lossless color image compression. First convert RGB to YCbCr or YUV space (e.g. with Octave). Then, save all integer values <=255 with 1 bit and the rest with 2, swapped in order, as in this code. Then, apply lossless methods (e.g. bzip2, etc) to the results.