Juby Rajan: Programming

Showing posts with label Programming. Show all posts

Monday, September 5, 2016

Matching test data factor levels for random forest models in R

When using random forest learning algorithm in R, following are frequently encountered errors while trying to do prediction against validation or test data:

New factor levels not present in the training data
Type of predictors in new data do not match that of the training data

Both are due to the factor levels or type of test data not matching that of training data. As mentioned in many forums and blogs, this can be resolved by matching the levels of test data and training data as follows:

for(colName in names(testData)) {
levels(testData[[colName]]) = levels(trainingData[[colName]])
}

But very often the training data is used to create a model which is persisted as an RDS file. During evaluation, the model is loaded and used for prediction on the test data. In this case the training data won't be available during the prediction.

There is not much information out there on how to match levels when we have only the model. If we have a closer look random forest implementation in R, random forest algorithm has level information in forest$xlevels field of the model . The following code snippet can be used to match levels from the model to the test data:

model = readRDS(modelFileName)
for(colName in names(testData)) {
levels(testData[[colName]]) = model$forest$xlevels[[colName]]
}

Saturday, August 6, 2016

R Code Snippets

Handling properties file

Sample File: test.properties
key1=value1
key2=value2
key3=value3

Read properties from file
filePath = "/path/to/properties/file"
props = read.table(filePath, header=FALSE, sep="=", row.names=1, strip.white=TRUE, na.strings="NA", stringsAsFactors=FALSE)

Properties can be accessed using their keys by props[key, 1]

Example:-
value = props["key1", 1]
print(value)

Prints value1

Loading choices for a Shiny app drop down from properties file
loadChoicesFromPropertiesFile = function(filePath) {
props = read.table(filePath, header=FALSE, sep="=", row.names=1, strip.white=TRUE, na.strings="NA", stringsAsFactors=FALSE)
choices = list()
for(key in row.names(props)) {
choices[[paste0(props[key, 1])]] = key
}
return (choices)
}

Defining the select drop down in ui.R
myOptions = loadChoicesFromPropertiesFile(filePath)
selectInput("myOptions", label = h4("Options"), choices = myOptions)

Wednesday, October 2, 2013

Remote debugging Java programs using jdb

The command line tool jdb can be a quick and convenient option to debug Java programs, particularly in environments where using an IDE will be an overhead or slow.

This can be used for any Java application like those running a main method, web applications, etc.

Steps:

Enable remote debugging for the Java application by adding the following JVM parameter

-agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n

The jdb tool can be invoked on the same machine as the running Java program or a remote machine that can access this machine.

jdb -connect com.sun.jdi.SocketAttach:hostname=hostname,port=8000

Refer https://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html for more details.

Saturday, September 28, 2013

Remote debugging Java applications using Eclipse IDE

Java applications can be debugged by attaching its source to a remotely IDE following a client-server approach. The running Java application is considered as server and the IDE with source attached is considered as client.

Server (Java application)
The application to be debugged should be started with the following JVM argument for Java 5.0 and beyond.
-agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n

The debugger listens to the port 8000.

Executing java -agentlib:jdwp=help on the command prompt shows the help and list of options.

For pre Java 5.0, the server should be started with
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=8000, suspend=n

Client (Eclipse IDE)

Click the Debug Configurations in the Debug button menu.
In Debug Configurations window, right click Remote Java Application and click New.
Give an appropriate name for the remote debug configuration.
The Connection Type should be Standard (Socket Attach)
Enter the Host on which the Java application is running.
Enter the Port which was used as addresss= while starting the Java application.
In the Source tab select a project or source jar.
Click Apply to save the settings.
Set appropriate break points in the source and start debugging by clicking on Debug in the Debug Configurations window.

Saturday, January 7, 2012

Hexspeak: Making hexadecimal numbers speak

http://en.wikipedia.org/wiki/Hexspeak

Monday, October 31, 2011

Google Interview Questions

140 Google Interview Questions

Answers To 15 Google Interview Questions That Will Make You Feel Stupid

Answers To 15 More Google Interview Questions That Will Make You Feel Stupid

Monday, July 26, 2010

DCI Architecture

DCI (Data Context Interaction) is a concept that could help us model our business solutions better. Proposed by Trygve Reenskaug (the creator of MVC), and Jim Coplien (well known expert on OOP and patterns).

DCI aims to have a simplified and more commonsense approach to "Object-Oriented Design". The key observation made is that most object-oriented approaches captures the structural aspects of systems well, they are not so good at effectively handling the behavioral and interaction aspects.

DCI stands for: D: Data, C: Context, I: Interaction.

Most software systems has these three aspects. The structural aspects (Data), Interacts with each other in well defined Contexts.

The main article by Jim Coplien could be found here.
A presentation and interview by Jim Coplien.

Thursday, May 13, 2010

Embedded Jetty Server

A simple utility class to embed Jetty web server:

import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.servlet.FilterHolder;
import org.eclipse.jetty.servlet.FilterMapping;
import org.eclipse.jetty.servlet.ServletContextHandler;
import org.eclipse.jetty.servlet.ServletHolder;

public enum WebContainer {
 INSTANCE;
 private Server server;

 public void startServer(String resourceBase, String contextPath, int port) {
  try {
   server = new Server(port);
   ServletContextHandler handler = new ServletContextHandler(ServletContextHandler.SESSIONS);
   handler.setContextPath(contextPath);
   handler.setResourceBase(resourceBase);
   server.setHandler(handler);
////////////////////////////////////////////////////////
//   add servlet context init params using the following
//   handler.getServletContext().setInitParameter(, );
////////////////////////////////////////////////////////

////////////////////////////////////////////////////////
//   add event listeners using
//   handler.addEventListener();
////////////////////////////////////////////////////////

////////////////////////////////////////////////////////
//   add filters using
//   handler.addFilter(, , );
////////////////////////////////////////////////////////
////////////////////////////////////////////////////////
//   add servlets using
//   handler.addServlet(, );
////////////////////////////////////////////////////////
   server.start();
   server.join();
  } catch(Exception e) {
   e.printStackTrace();
  }
 }
 
 public void stopServer() {
  try {
   server.stop();
  } catch(Exception e) {
   e.printStackTrace();
  }
 }
}

Use WebContainer.INSTANCE.startServer(...); and WebContainer.INSTANCE.stopServer(); to start and stop the server respectively.

Thursday, April 29, 2010

Implementing a simple deterministic rule engine

The following is my experience of implementing a simple deterministic rule engine (if at all it can be called such) in JavaScript using Mozilla Rhino. This was part of the enterprise platform we are building in our company. We tried using Rete based JBoss Drools earlier but felt that a simpler deterministic rule engine would better suit our purpose.

The need for such a thing was due to various reasons:-
i) Rete based stateful rule engine was an overkill for our use case
ii) The non deterministic and asynchronous nature of Rete based rule engines makes debugging and testing very difficult (as pointed out by Martin Fowler)
iii) Most of the rules in our case were of the nature of validations or calculations which makes Rete an overkill
iv) A tailor made custom rule engine gives more flexibility and power
v) Our consumers (product developers) were not at ease with the programming style of traditional rule based approach

The approach went something like this:-
i) Enable server side scripting in JavaScript using Mozilla's Rhino engine
ii) Capture rules as metadata in the form of JavaScript statements (majority of them will be of the form if condition then action)
iii) Expose domain objects and other relevant APIs into the rule scripts at well defined lifecycle phases of entities (CREATE, UPDATE, DELETE, etc.)
iv) Execute the rule scripts within the context of domain objects and relevant APIs during the life cycle of domain objects (typically CRUD phases)
v) We also support a MANUAL context wherein developers can set relevant domain objects explicitly and invoke the rule engine

The beauty of LISP (and Clojure)

A couple of years ago, I accidentally came across the strange word Clojure, which I later came to know as a LISP style programming language on the JVM. At that point I was a total newbie to functional programming (and LISP), having spend most of my programming time on C, C++, Java, etc.

The first impression I had when looking at a medium size Clojure program was that of a mess of parenthesis. Why in the world would anybody want to write programs of this sort?

As many others, it took a while for me to truly appreciate the simplicity, elegance and beauty of functional programming in general and LISP in particular.

Clojure might have a popular future due to its clean way of handling concurrency and light weight threads (through Software Transactional Memories aka STM)

Hats off to John McCarthy for creating LISP and Rich Hickey for creating Clojure.

Thursday, February 25, 2010

"Hello World" in different languages

An interesting collection of the standard "Hello World" program in a number of languages.

Bit manipulation techniques

Some simple bit manipulation techniques

Set a bit: x |= (1 << position);
Clear a bit: x &= ~(1 << position);
Toggle a bit: x ^= (1 << position);
Test a bit: ((x >> position) & 1) != 0)
Test if power of two: ((x & (x - 1)) == 0)
Divide 'x' by 2 'n' times: x = x >> n;
Multiple 'x' by 2 'n' times: x = x << n;
Swap two numbers:
void swap(int& a, int& b)
{
a ^= b;
b ^= a;
a ^= b;
}

Wednesday, February 24, 2010

Customozing Spring's Routing Datasource

Changing the data source of your application dynamically at runtime is a very desirable feature for complex enterprise applications and frameworks. If you are using Spring, one of the viable options is Spring's AbstractRoutingDataSource, which allows dynamic data sources based on a lookup key. It uses the well known Decorator pattern to provide a javax.sql.DataSource instance dynamically.

The following article by Mark Fisher from Spring team clearly explains how to go about it:
http://blog.springsource.com/2007/01/23/dynamic-datasource-routing/

The default usage of Spring's AbstractRoutingDataSource is something as follows:
i) You have a set of data sources among which you want to switch dynamically.
ii) You have a well known dynamic key that can pick the right data source for you.

But what if you want to change the set of data sources themselves dynamically? The default usage of AbstractRoutingDataSource won't allow this. After digging into Spring code for some time I found that the following piece of code can do the job.


public class RoutingDataSource extends AbstractRoutingDataSource {
 private Map targetDataSources = new HashMap();
 public void setTargetDataSources(Map targetDataSources) {
  this.targetDataSources = targetDataSources;
 }

 @Override
 protected DataSource determineTargetDataSource() {
 Object lookupKey = determineCurrentLookupKey();
 DataSource dataSource = (DataSource)this.targetDataSources.get(lookupKey);
  if (dataSource == null) {
  throw new IllegalStateException("Cannot determine target DataSource for lookup key [" + lookupKey + "]");
  }
  return dataSource;
 }

 @Override
 protected Object determineCurrentLookupKey() {
  return /*a lookup key that can pick the actual datasource from the Map*/;
 }

 @Override
 public void afterPropertiesSet() {
  // do nothing
  // overridden to avoid datasource validation error by Spring
 }
}

Sunday, February 21, 2010

Some JavaScript utility functions


/**
 * Trim a string
 */
function trim(str) {
    str = str + '';
    return str.replace(/^\s+|\s+$/g, '');
}

/**
 * Merge two JSONs, with controllable overwrite.
 */
function mergeJson(target, source, overwrite) {
    for (var item in source) {
        if (target[item] && typeof source[item] === 'object') {
            this.mergeJson(target[item], source[item]);
        } else if (!target[item] || overwrite === true) {
            target[item] = source[item];
        }
    }
    return target;
}

/**
 * Check if an array contains a given elements.
 */
function arrayContainsElement(arr, ele) {
    var contains = false;
    if (arr) {
        for (var i = 0; i < arr.length; i++) {
            if (this.trim(arr[i]) === this.trim(ele)) {
                contains = true;
                break;
            }
        }
    }
    return contains;
}

Monday, January 11, 2010

Javascript on Java

This post illustrates two versions of a simple Java helper class that enables scripting on Java. Its assumed that the reader is aware of the basics of running JavaScript on Java.

The first version uses "Mozilla's Rhino Engine" directly, the second uses the more standard "Java 6" scripting interface (JSR-223).

Using "Mozilla's Rhino Engine" directly:-
import java.util.HashMap;
import java.util.Map;
import org.mozilla.javascript.Context;
import org.mozilla.javascript.Scriptable;

/**
* High level wrapper to Mozilla Rhino Engine.
*/
public class ScriptSession {
private Map scopeObjects = new HashMap();

public void put(String key, Object value) {
scopeObjects.put(key, value);
}

public Object execute(String script) {
Object result = null;
try {
Context ctx = Context.enter();
Scriptable scope = ctx.initStandardObjects();
for(String key : scopeObjects.keySet()) {
scope.put(key, scope, scopeObjects.get(key));
}
result = ctx.evaluateString(scope, script, "embedded_script", 1, null);
} catch(Exception e) {
e.printStackTrace();
} finally {
Context.exit();
}
return result;
}

public static ScriptSession createInstance() {
return new ScriptSession();
}
}

Using Java 6 standard interface:-
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

/**
* High level wrapper to JDK 6 scripting API.
*/
public class ScriptSession {
private ScriptEngine engine;

private ScriptSession() {
ScriptEngineManager factory = new ScriptEngineManager();
engine = factory.getEngineByName("JavaScript");
}

public void put(String key, Object value) {
engine.put(key, value);
}

public Object execute(String script) {
Object result = null;
try {
result = engine.eval(script);
} catch(Exception e) {
e.printStackTrace();
}
return result;
}

public static ScriptSession createInstance() {
return new ScriptSession();
}
}

Usage:-
ScriptSession ss = ScriptSession.createInstance();
// put all the context objects you want using
// session.put("", );
Object result = ss.execute("javascript as text");

Inside JavaScript you can access all your context objects the same way you would access Java objects.

Juby Rajan