Use efficient steaming to upload your files to server

I was trying to figure out if I can upload files using streaming process.

What is streaming process: byte by byte upload.

Advantage: On interruption on upload process due to bad/slow network connection, you do not need to send bytes again that has already been sent to server/storage disk.

Disadvantage: You need to send each byte.

Now let’s choose the middle way, by using buffers. We can load buffers with bytes and send to storage/server when buffers are full.

Let’s see the attached code:

final byte[] bytesRead = new byte[bufferSize];
int noOfBytesRead = 0;
long totalNoOfBytesRead = 0;
long endOffset = 0;
long endOffset1 = 0;
long beginOffset = startingOffset; // if we have any starting offset, else start with 0
final ByteArrayOutputStream baos = new ByteArrayOutputStream(bufferSize);

while ((noOfBytesRead = stream.read(bytesRead)) != -1) {
endOffset = totalNoOfBytesRead – 1;

final byte[] bytesToStorage = new byte[noOfBytesRead];

System.arraycopy(bytesRead, 0, bytesToStorage, 0, noOfBytesRead);

if (baos.toByteArray().length > bufferSize) {
endOffset1 = (beginOffset + baos.toByteArray().length) – 1;
}
uploadBytes(baos.toByteArray(), beginOffset, endOffset1);
beginOffset = endOffset1 + 1;
baos.reset();
}
baos.write(bytesToStorage, 0, bytesToStorage.length);
}

}

endOffset = endOffset + startingOffset; // If there is starting offset, else no need.

if ((baos.toByteArray().length != 0) && (baos.toByteArray().length <= bufferSize)) {
uploadBytes(baos.toByteArray(), beginOffset, endOffset);
baos.reset();
}

Simple use of buffer: Few things to remember-

  1. Keep reading your bytes unless we have reached to end of file.
  2. Load data in buffer, for that define ByteArrayOutputStream to copy data from input stream.
  3. Calculate begin and end offset positions as we are moving bytes along
  4. Keep checking buffer size, if is more than defined limit, send all bytes to server (storage) with start and end offset and clean buffer.
  5. In last section (After while loop), if we have reached to end of file (bytes == -1), send remaining bytes to server/storage with correct starting and end offset.

Above method can be optimized a bit, but it works well for us, hence I keep myself stick to it.

Rotating Tomcat Catalina.out file

In some cases, tomcat prints huge logs to catalina.out file and some time quite redundant. Problem comes when size of file is constantly increasing and creates panic alarm for disk space (if logs are not configured to separate partition/mounted-disks)

You need to configure 3 things for implementing and configuring logrotate

1. /etc/cron.daily/ folder – contains logrotate file, to be executed on daily basis by loading configuration file ‘logrorate.conf’ in same folder

2. /etc/logrotate.conf file- Configuration file for cron job. It has entries of all configuration to be rotated on daily basis.

3. /etc/logrotate.d  folder – logrotate.conf loads all configuration files from this folder.

Hence, to do a log rotation configuration, you can copy a configuration file in ‘logrotate.d’ folder OR directly add entries of configuration in ‘logrotate.conf’ file.

What is the configuration:

Create new file (any name) in ‘logrorate.d’ folder.

<path-to-log-folder>/catalina.out {
copytruncate
daily
rotate 7
compress
missingok
size 10M
}

‘logrotate’ has many useful features that can be checked by ‘man logrotate’ @ unix machine. Few are:

  • copy – Make a copy of the log file, but don’t change the original at all.
  • mail <email@address> – When a log is rotated out-of-existence, it is mailed to address.
  • olddir <directory> – Logs are moved into <directory> for rotation.
  • postrotate/endscript – The lines between postrotate and endscript are executed after the log file is rotated.

Reference: http://linuxconfig.org/setting-up-logrotate-on-redhat-linux

A simple Mongo based Web application

I was watching tutorial on MongoDB Mj101, they have create a simple web application using following technical components:

1. Spark Java embedded platform (for web server) – http://www.sparkjava.com/index.html

2. Free Marker (front end page designing tool)

3. MongoDb – Backend database

Lets start with simple web server by exposing simple String:

1. Add repository in pom.xml file:

<repositories>
<repository>
<id>Spark repository</id>
<url>http://www.sparkjava.com/nexus/content/repositories/spark/</url&gt;
</repository>
</repositories>
<dependency>
<groupId>spark</groupId>
<artifactId>spark</artifactId>
<version>0.9.9.4-SNAPSHOT</version>
</dependency>

2. Create a simple Java file:

import static spark.Spark.*;
import spark.*;

public class HelloWorld {

public static void main(String[] args) {
get(new Route(“/hello”) {
@Override
public Object handle(Request request, Response response) {
return “Hello World!”;
}
});
}
}

we can access this by using URL – http://localhost:4567/hello

There is no XML and properties configuration to start a web application.

3. Integrate it will Free Marker

Add Repository:

<dependency>
<groupId>org.freemarker</groupId>
<artifactId>freemarker</artifactId>
<version>2.3.19</version>
</dependency>

Update java code:

final Configuration configuration = new Configuration();

configuration.setClassForTemplateLoading(Week1Homework4.class, “/”); – “/” path is relative your class file in first parameter.

Create template:

<html>
<head>
<title>The Answer</title>
</head>
<body>
<h1>The answer is: ${answer}</h1>
</body>
</html>

Fill the template with Java Map where keys are already defined placeholders on html page

Template helloTemplate = configuration.getTemplate(“answer.ftl”)

Map<String, String> answerMap = new HashMap<String, String>();

answerMap.put(“answer”, Integer.toString(answer));

helloTemplate.process(answerMap, writer); // Write may be any StringBuffer/StringWriter

return writer;

4. Combining MongoDB with FreeMarker

The great advantage of Freemarker is that it is using Map for name and value pair for data presentation. However MangoDB’s BasicDBObject is also a implementation of Map to represent JSON format data.

Hence passing directly BasicDBObject will replace place holder tags on Free marker template with respective value from MongoDB.

Video is available: https://education.mongodb.com/courses/10gen/M101J/2013_October/courseware/Week_1_-_Introduction/All_together_now_MongoDB_Spark_and_Freemarker/

Pig – Experiments

To start Pig, just down load pig installation from Apache site. I tried with pig 0.11 version.

Pig runs in 2 modes:

Local Mode – start pig with “pig -x local”. You can see ‘grunt>’ shell for executing further commands from here.

Map-Reduce mode – Default mode. If there is no hadoop environment available, you can following message on command  prompt:

ERROR 4010: Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath).

Dump and Store: Dump command is used to show data on screen, ‘Store’ command save output to file.

Exec and Run commands: Exec – No interaction with grunt shell. Run – partial job can be written in shell and rest can interact with shell.

grunt> cat myscript.pig
a = LOAD 'student' AS (name, age, gpa);
b = LIMIT a 3;
DUMP b;

grunt> exec myscript.pig

Embedded java program:

Java program can be written for execution of command. This is helpful to interact pig code with other piece of java code/logic available in your solution.

import java.io.IOException;
import org.apache.pig.PigServer;
public class idlocal{ 
public static void main(String[] args) {
try {
    PigServer pigServer = new PigServer("local"); - local or mapreduce
    runIdQuery(pigServer, "passwd");
    }
    catch(Exception e) {
    }
 }
public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
    pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(':');");
    pigServer.registerQuery("B = foreach A generate $0 as id;");
    pigServer.store("B", "id.out");
 }
}

Execute above program using:

javac -cp pig.jar idlocal.java

java -cp pig.jar:. idlocal

For Hadoop environment:

export HADOOPDIR=/yourHADOOPsite/conf
java -cp pig.jar:.:$HADOOPDIR idmapreduce

Debugging Pig Latin:

(Pig latin is LOAD > Transform > Store)

  1. Use the DESCRIBE operator to review the schema of a relation.
  2. Use the EXPLAIN operator to view the logical, physical, or map reduce execution plans to compute a relation.
  3. Use the ILLUSTRATE operator to view the step-by-step execution of a series of statements.

Operators:

FILTER operator to work with tuples or rows of data.

FOREACH operator to work with columns of data.

GROUP operator to group data in a single relation.

COGROUP and JOIN operators to group or join data in two or more relations.

UNION operator to merge the contents of two or more relations.

SPLIT operator to partition the contents of a relation into multiple relations.

Error Handling:

pig -F myscript.pig

or $ pig -stop_on_failure myscript.pig
In above case, Pig will stop execution when the first failed job is detected.

MongoDB is great, but…

I did several experiments for MongoDB. I love its simplicity as a developer, specially while working with shell.

Positives:

Several good features-

  • Schema less: Easy to add fields with out changing existing data structure.  It is really helpful when it is difficult to design fully grown schema in one go for a big application.
  • Mongo Shell: Simple and intuitive.  You can verify and unit tested your code’s output using shell commands or by writing a simple java script function.
  • Simple Java Driver: Quite simple and well documented APIs. Easy to work with. Learning curve is not there. Coding is mostly similar to normal SQL except think in data pairs (key-value) rather than only value.
  • Simple Setup: Start coding in few mins. No boiler plater configuration files. No complexity in Replica sets setup.
  • Easy Map-Reduce, composite functions.

Negatives:

Oh yes, As coming from Cassandra background, I feel few bad architecture design concepts that limit the usages of MongoDB while comparing with Cassandra.

  • Memory management – MongoDB is mapping entire data set in memory that causes page fault. Probably causing performance bottleneck.
  • Written in C/C++ – Coming from Java background, I am feeling and enjoying the comfort of open source code while working with issues/bugs/concepts that leads me to the internal libraries.
  • Performance – Mongo is using BTrees to index underline data. MongoDB needs to have traversal to reach to data point. In case of simple ‘count’, data needs to be faulted in memory for big data sets. Count traversal issue is solved in recent MongoDb versions, but loading all indexed data may purged out recently used hot data from memory and can lead to performance issue.
  • Huge document size with repeated data – Suppose you document has key ‘user name’, it can also be saved as simple letter ‘u’ to save good disk space (if you have good number of such documents). MongoDB can also support logical filed mapping (‘u’ means ‘user name’) to give documents transparently with out writing any conversion code at client end.
  • Write Lock: Pathetic feature, If my collection A is connected to collection B, write on collection A will lock collection B also. I think reads are also block on these collections. Not sure for latest version on MongoDB. I was reading a blog that suggests a practical limit of write transaction to 200 wps. Mongo suggests sharding for such cases. And you need structure your shards before starting data storage.
  • Not fully Distribute Architecture: Like cassandra, it is not fully distributed.  Mongo is treating nodes differently like master-slave, active-passive.
  • Durability: Unless you have safe mode = true, data is not guaranteed to save.
  • Re-documentation: A minor field update in document needs to re-write full document. There may be possibility for  fragmentation of documentation over the period.
  • Compaction: It is tough to decide when to run compaction. It is time consuming process and blocks all operations. MongoDB suggests to run it on secondary (making it offline) and then primary.
  • Poor throughput while failover: As second database is not receiving any transaction, failover to secondary needs to load all required data in memory and have several page fault.

I feel MongoDB is perfect fit for a small read-heavy, write-light data load. However I am not comparing MongoDB with cassandra as both have different objectives.

Few References:

http://schmichael.com/files/schmongodb/Scaling%20with%20MongoDB%20(with%20notes).pdf

http://www.datastax.com/dev/blog/2012-in-review-performance

Mocking Sample – ManagementFactory

Some time back I was googling to do mocking on my JMX layer. Day was out of luck, google servers were not able to read my mind. I tried different combinations to get help on my issue, but all that hardship lead me to think again for simple JMX Management Factory test case.

Code to be tested:

        final MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
        final ObjectName oname = new ObjectName("com.package:type=Summary");
        final String[] databases = (String[])mbs.getAttribute(oname, "Databases");
        for (final String database : databases) {
             final CompositeData compositeData = (CompositeData)mbs.invoke(oname, "getDatabaseStatus", 
                          new Object[] { database }, new String[] { String.class.getName() });
              ...//some functional code
       }

My mocking finally:

@RunWith(PowerMockRunner.class)
@PowerMockIgnore(value = { "javax.management.*", "ObjectName.class" })
public class MockedClass {
     MBeanServer mbs = createMock(MBeanServer.class);
     PowerMock.mockStatic(ManagementFactory.class);
     expect(ManagementFactory.getPlatformMBeanServer()).andReturn(mbs);
     ObjectName oname = PowerMock.createMockAndExpectNew(ObjectName.class, objectName);
     final String[] databases = new String[] { "Alpha", "Beta" };
     expect(mbs.getAttribute(oname, "Databases")).andReturn(databases);
     final CompositeData mockCompositeData = createMock(CompositeData.class);
     // Add expectation as per logic
     for (final String database : databases) {
            expect(mbs.invoke(isA(ObjectName.class), isA(String.class), cmp(new Object[] { database }, 
                         new Comparator<Object[]>() {
                @Override
                public int compare(Object[] params1, Object[] params2) {
                   if ((params1 != null) && (params2 != null)) {
                        assertEquals("Size of passing parameters should be same", params1.length, 
                                     params2.length);
                   } 
                   return 0; 
               }
            }, LogicalOperator.EQUAL), cmp(new String[] { String.class.getName() }, 
                         new Comparator<String[]>() {
                @Override
                public int compare(String[] params1, String[] params2) {
                    if ((params1 != null) && (params2 != null)) {
                        assertEquals("Size of signature  should be same", params1.length, params2.length);                        }
                    return 0;
                }
            }, LogicalOperator.EQUAL))).andReturn(mockCompositeData);
        }
        PowerMock.replayAll(mbs, mockCompositeData);

        // System under test
        // Call test method

        PowerMock.verifyAll();
        PowerMock.resetAll();
}

Unit Testing – Mocking complexities

I know unit test is critical piece of code that allows your function to work. But to allow failing your functional code, you unit test case should be strict and  covers all (or most) positive and negative boundary cases.

I mostly faced issues while mocking java objects. Mocking objects and pass in to program is simple. For that just create simple mocking using EasyMock (or any other mocking platform), expect your method call, returns what you want to inject in your functional flow, test output on this basis of injected values.

Lets think of few cases:

1. Your program is creating an instance in between your functional code. There is no handle to get/set this object.

- Only option to mock this object is the byte code instrumentation. Powermock is very handy for such cases:

Ex: Code
myMethod(){
-----
ComplexClass compx = new ComplexClass();
Instance instance = compx.init();
----

Suppose, you want to mock Properties object and/or control it, use powermock

// Add @RunWith(PowerMockRunner.class) on class 
//Add @PrepareForTest({ ClassToBeMocked.class, Class_that_needs_instrumentation.class })
ComplexClass cls = EasyMock.createMock(ComplexClass.class);
PowerMock.expectNew(ComplexClass.class).andReturn(cls);
Or //PowerMock.createMockAndExpectNew(ComplexClass.class)

In above case, new instance in actual program will be injected with mocked object.

You have control now:

expect(cls.init).andReturn(instance) - similar operation on expectation

2. Static class – use mockStatic

MBeanServer mbs = createMock(MBeanServer.class);
PowerMock.mockStatic(ManagementFactory.class);
expect(ManagementFactory.getPlatformMBeanServer()).andReturn(mbs);

3. Static initializer in class

Ex: 
private static final String PROPERTIES_QUERY = QueryConfig.formatRepositorySQL("select propval from " + PROPS_TABLE_NAME +
 " where propkey = ? and podname = ?");

Better avoid mocking these calls using:

 @SuppressStaticInitializationFor({ "com.package.QueryConfig"}) // at class level.

4. Need to create an instance of a class, but do not want to invoke a constructor of that class

Whitebox.newInstance(ClassWithEvilConstructor.class)

5. Inject/get static variables in class

Whitebox.setInternalState(tested, "reportTemplateService", reportTemplateServiceMock);
Set<String> services = Whitebox.getInternalState(tested, "services");

6. Want to create a mock but need to suppress a constructor call:

PowerMock.suppress(MemberMatcher.constructor(NicerSingleton.class));

7. While doing expectation on a method, all attributes of that method should exactly be matched with caller method. Some time this is an issue if passing attributes are ‘objects’ and are created internally in program.

Mocking platform matches these objects with object id, hence object id should be same for all passing attributes.

Ex:  expect(mbs.invoke(ObjectName.class, String.class, Object[])).andReturn(xxxx);

In above method, you can not pass objects created in test program like

expect(mbs.invoke(mockObjectName, "abc", new Object[]{mockObject})).andReturn(xxxx);

If you are creating any new object in test case, it should be passed to real program so actual method will execute with same object Ids.

Else:

expect(mbs.invoke(isA(ObjectName.class), isA(String.class), isA(Object.class)).andReturn(mockCompositeData);

In case, you want to test parameters, use ‘cmp’ operator and compare both parameters:

expect(mbs.invoke(isA(ObjectName.class), isA(String.class), cmp(new Object[] { database }, new Comparator<Object[]>() {
      @Override
      public int compare(Object[] params1, Object[] params2) {
          if ((params1 != null) && (params2 != null)) {
                 assertEquals("Size of passing parameters should be same", params1.length, params2.length
                }
                    return 0;
                }
            }, LogicalOperator.EQUAL), .....)).andReturn(mockCompositeData);
Follow

Get every new post delivered to your Inbox.