Please see GitHub page.

Recent notes:

Here is an example code for implementing parallelized dense matrix inversion on the GPU with CUDA and cuBLAS (using a two step solver).
Here is an example of accelerating 2-D kernel evaluations with CPU and GPU C code using R interface.
Example of producer/consumer code with struct option pass to thread functions.
Parameter tuning for machine learning can be done with grid search or auto-ml tools such as auto-sklearn.

Replace a column newline separated data (as from e.g. cat output on multiple files) to CSV row with sed: sed -i ':a;N;\$!ba;s/\n/, /g' file
With Java GUI applications, there can be an issue with small font size. For example, the following command sequence can be used to upscale the font sizes in the Java Weka application: GDK_SCALE=2 java -Dswing.aatext=true -Dswing.plaf.metal.controlFont=Tahoma-plain-22 -Dswing.plaf.metal.userFont=Tahoma-plain-22 -jar /path/to/weka.jar.

Here is a note on modeling the spread of the coronavirus (originally written 04.2020).

Implementation of barrier synchronization with mutexes.
An implementation of an O(n) integer array counting sort which returns sorted result and permutation re-indexing information.

Fast parallel sorting is availabe in Java (8 and up). Here is an example. Disk f/s error check (e.g 5yr old drive) on appearance of input/output errors when syncing files: unmount partition and run, e2fsck -c -v -y /dev/sdaX, e2fsck -f -p -C 0 /dev/sdaX, then fsck -yvfM /dev/sdaX. Some of these are duplicates but the combination may help as temporary fix to enable copying files off disk before further degradation.

Automated algorithm and parameter selection in ML models is available via auto weka in Java and autosklearn in Python, relying on parameter sweep, coordinate descent, and Bayesian opt methods. The drawback is of course the increased runtime, but the upper bound can be passed as a parameter. Here is an example call sequence.

Here is how to plot a basic confusion matrix with R: library('caret'); confmat = confusionMatrix(predicted, actuals); library(vcd); mosaic(confmat$table);. And here is a code snippet in Python which shows how to get individual, per class classification measures using scikit functions.

To get column count in a csv file for every row (useful to check if csv was created correctly), you can use Perl as follows: perl -nle 's/".*?"//g;print s/,//g+1' fname.csv. To remove the last few columns of a csv file, you can use awk like this: awk 'BEGIN{FS=OFS=","}{for(i=0; i<6; i++) NF--; print}' sample.csv > sample2.csv. This is handy for large files. When processing text files, two useful commands are awk and sed. To extract a given row, one can use (awk 'FNR==2' file.txt). To display a given column of a csv file, one can use (awk -F "\"*,\"*" '{print $2}' file.csv). It's often useful to remove a subset of the rows of a file (or set of files). This can be efficiently accomplished with the sed command (e.g. sed --in-place '2,40d;' *txt will remove lines 2 to 40). Here is an example script to remove random rows with sed.

Simple lossless color image compression. First convert RGB to YCbCr or YUV space (e.g. with Octave). Then, save all integer values <=255 with 1 bit and the rest with 2, swapped in order, as in this code. Then, apply lossless methods (e.g. bzip2, etc) to the results.