A simple Mongo based Web application

I was watching tutorial on MongoDB Mj101, they have create a simple web application using following technical components:

1. Spark Java embedded platform (for web server) – http://www.sparkjava.com/index.html

2. Free Marker (front end page designing tool)

3. MongoDb – Backend database

Lets start with simple web server by exposing simple String:

1. Add repository in pom.xml file:

<repositories>
<repository>
<id>Spark repository</id>
<url>http://www.sparkjava.com/nexus/content/repositories/spark/</url&gt;
</repository>
</repositories>
<dependency>
<groupId>spark</groupId>
<artifactId>spark</artifactId>
<version>0.9.9.4-SNAPSHOT</version>
</dependency>

2. Create a simple Java file:

import static spark.Spark.*;
import spark.*;

public class HelloWorld {

public static void main(String[] args) {
get(new Route(“/hello”) {
@Override
public Object handle(Request request, Response response) {
return “Hello World!”;
}
});
}
}

we can access this by using URL – http://localhost:4567/hello

There is no XML and properties configuration to start a web application.

3. Integrate it will Free Marker

Add Repository:

<dependency>
<groupId>org.freemarker</groupId>
<artifactId>freemarker</artifactId>
<version>2.3.19</version>
</dependency>

Update java code:

final Configuration configuration = new Configuration();

configuration.setClassForTemplateLoading(Week1Homework4.class, “/”); – “/” path is relative your class file in first parameter.

Create template:

<html>
<head>
<title>The Answer</title>
</head>
<body>
<h1>The answer is: ${answer}</h1>
</body>
</html>

Fill the template with Java Map where keys are already defined placeholders on html page

Template helloTemplate = configuration.getTemplate(“answer.ftl”)

Map<String, String> answerMap = new HashMap<String, String>();

answerMap.put(“answer”, Integer.toString(answer));

helloTemplate.process(answerMap, writer); // Write may be any StringBuffer/StringWriter

return writer;

4. Combining MongoDB with FreeMarker

The great advantage of Freemarker is that it is using Map for name and value pair for data presentation. However MangoDB’s BasicDBObject is also a implementation of Map to represent JSON format data.

Hence passing directly BasicDBObject will replace place holder tags on Free marker template with respective value from MongoDB.

Video is available: https://education.mongodb.com/courses/10gen/M101J/2013_October/courseware/Week_1_-_Introduction/All_together_now_MongoDB_Spark_and_Freemarker/

Pig – Experiments

To start Pig, just down load pig installation from Apache site. I tried with pig 0.11 version.

Pig runs in 2 modes:

Local Mode – start pig with “pig -x local”. You can see ‘grunt>’ shell for executing further commands from here.

Map-Reduce mode – Default mode. If there is no hadoop environment available, you can following message on command  prompt:

ERROR 4010: Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath).

Dump and Store: Dump command is used to show data on screen, ‘Store’ command save output to file.

Exec and Run commands: Exec – No interaction with grunt shell. Run – partial job can be written in shell and rest can interact with shell.

grunt> cat myscript.pig
a = LOAD 'student' AS (name, age, gpa);
b = LIMIT a 3;
DUMP b;

grunt> exec myscript.pig

Embedded java program:

Java program can be written for execution of command. This is helpful to interact pig code with other piece of java code/logic available in your solution.

import java.io.IOException;
import org.apache.pig.PigServer;
public class idlocal{ 
public static void main(String[] args) {
try {
    PigServer pigServer = new PigServer("local"); - local or mapreduce
    runIdQuery(pigServer, "passwd");
    }
    catch(Exception e) {
    }
 }
public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
    pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(':');");
    pigServer.registerQuery("B = foreach A generate $0 as id;");
    pigServer.store("B", "id.out");
 }
}

Execute above program using:

javac -cp pig.jar idlocal.java

java -cp pig.jar:. idlocal

For Hadoop environment:

export HADOOPDIR=/yourHADOOPsite/conf
java -cp pig.jar:.:$HADOOPDIR idmapreduce

Debugging Pig Latin:

(Pig latin is LOAD > Transform > Store)

  1. Use the DESCRIBE operator to review the schema of a relation.
  2. Use the EXPLAIN operator to view the logical, physical, or map reduce execution plans to compute a relation.
  3. Use the ILLUSTRATE operator to view the step-by-step execution of a series of statements.

Operators:

FILTER operator to work with tuples or rows of data.

FOREACH operator to work with columns of data.

GROUP operator to group data in a single relation.

COGROUP and JOIN operators to group or join data in two or more relations.

UNION operator to merge the contents of two or more relations.

SPLIT operator to partition the contents of a relation into multiple relations.

Error Handling:

pig -F myscript.pig

or $ pig -stop_on_failure myscript.pig
In above case, Pig will stop execution when the first failed job is detected.

MongoDB is great, but…

I did several experiments for MongoDB. I love its simplicity as a developer, specially while working with shell.

Positives:

Several good features-

  • Schema less: Easy to add fields with out changing existing data structure.  It is really helpful when it is difficult to design fully grown schema in one go for a big application.
  • Mongo Shell: Simple and intuitive.  You can verify and unit tested your code’s output using shell commands or by writing a simple java script function.
  • Simple Java Driver: Quite simple and well documented APIs. Easy to work with. Learning curve is not there. Coding is mostly similar to normal SQL except think in data pairs (key-value) rather than only value.
  • Simple Setup: Start coding in few mins. No boiler plater configuration files. No complexity in Replica sets setup.
  • Easy Map-Reduce, composite functions.

Negatives:

Oh yes, As coming from Cassandra background, I feel few bad architecture design concepts that limit the usages of MongoDB while comparing with Cassandra.

  • Memory management – MongoDB is mapping entire data set in memory that causes page fault. Probably causing performance bottleneck.
  • Written in C/C++ – Coming from Java background, I am feeling and enjoying the comfort of open source code while working with issues/bugs/concepts that leads me to the internal libraries.
  • Performance – Mongo is using BTrees to index underline data. MongoDB needs to have traversal to reach to data point. In case of simple ‘count’, data needs to be faulted in memory for big data sets. Count traversal issue is solved in recent MongoDb versions, but loading all indexed data may purged out recently used hot data from memory and can lead to performance issue.
  • Huge document size with repeated data – Suppose you document has key ‘user name’, it can also be saved as simple letter ‘u’ to save good disk space (if you have good number of such documents). MongoDB can also support logical filed mapping (‘u’ means ‘user name’) to give documents transparently with out writing any conversion code at client end.
  • Write Lock: Pathetic feature, If my collection A is connected to collection B, write on collection A will lock collection B also. I think reads are also block on these collections. Not sure for latest version on MongoDB. I was reading a blog that suggests a practical limit of write transaction to 200 wps. Mongo suggests sharding for such cases. And you need structure your shards before starting data storage.
  • Not fully Distribute Architecture: Like cassandra, it is not fully distributed.  Mongo is treating nodes differently like master-slave, active-passive.
  • Durability: Unless you have safe mode = true, data is not guaranteed to save.
  • Re-documentation: A minor field update in document needs to re-write full document. There may be possibility for  fragmentation of documentation over the period.
  • Compaction: It is tough to decide when to run compaction. It is time consuming process and blocks all operations. MongoDB suggests to run it on secondary (making it offline) and then primary.
  • Poor throughput while failover: As second database is not receiving any transaction, failover to secondary needs to load all required data in memory and have several page fault.

I feel MongoDB is perfect fit for a small read-heavy, write-light data load. However I am not comparing MongoDB with cassandra as both have different objectives.

Few References:

http://schmichael.com/files/schmongodb/Scaling%20with%20MongoDB%20(with%20notes).pdf

http://www.datastax.com/dev/blog/2012-in-review-performance

Mocking Sample – ManagementFactory

Some time back I was googling to do mocking on my JMX layer. Day was out of luck, google servers were not able to read my mind. I tried different combinations to get help on my issue, but all that hardship lead me to think again for simple JMX Management Factory test case.

Code to be tested:

        final MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
        final ObjectName oname = new ObjectName("com.package:type=Summary");
        final String[] databases = (String[])mbs.getAttribute(oname, "Databases");
        for (final String database : databases) {
             final CompositeData compositeData = (CompositeData)mbs.invoke(oname, "getDatabaseStatus", 
                          new Object[] { database }, new String[] { String.class.getName() });
              ...//some functional code
       }

My mocking finally:

@RunWith(PowerMockRunner.class)
@PowerMockIgnore(value = { "javax.management.*", "ObjectName.class" })
public class MockedClass {
     MBeanServer mbs = createMock(MBeanServer.class);
     PowerMock.mockStatic(ManagementFactory.class);
     expect(ManagementFactory.getPlatformMBeanServer()).andReturn(mbs);
     ObjectName oname = PowerMock.createMockAndExpectNew(ObjectName.class, objectName);
     final String[] databases = new String[] { "Alpha", "Beta" };
     expect(mbs.getAttribute(oname, "Databases")).andReturn(databases);
     final CompositeData mockCompositeData = createMock(CompositeData.class);
     // Add expectation as per logic
     for (final String database : databases) {
            expect(mbs.invoke(isA(ObjectName.class), isA(String.class), cmp(new Object[] { database }, 
                         new Comparator<Object[]>() {
                @Override
                public int compare(Object[] params1, Object[] params2) {
                   if ((params1 != null) && (params2 != null)) {
                        assertEquals("Size of passing parameters should be same", params1.length, 
                                     params2.length);
                   } 
                   return 0; 
               }
            }, LogicalOperator.EQUAL), cmp(new String[] { String.class.getName() }, 
                         new Comparator<String[]>() {
                @Override
                public int compare(String[] params1, String[] params2) {
                    if ((params1 != null) && (params2 != null)) {
                        assertEquals("Size of signature  should be same", params1.length, params2.length);                        }
                    return 0;
                }
            }, LogicalOperator.EQUAL))).andReturn(mockCompositeData);
        }
        PowerMock.replayAll(mbs, mockCompositeData);

        // System under test
        // Call test method

        PowerMock.verifyAll();
        PowerMock.resetAll();
}

Unit Testing – Mocking complexities

I know unit test is critical piece of code that allows your function to work. But to allow failing your functional code, you unit test case should be strict and  covers all (or most) positive and negative boundary cases.

I mostly faced issues while mocking java objects. Mocking objects and pass in to program is simple. For that just create simple mocking using EasyMock (or any other mocking platform), expect your method call, returns what you want to inject in your functional flow, test output on this basis of injected values.

Lets think of few cases:

1. Your program is creating an instance in between your functional code. There is no handle to get/set this object.

- Only option to mock this object is the byte code instrumentation. Powermock is very handy for such cases:

Ex: Code
myMethod(){
-----
ComplexClass compx = new ComplexClass();
Instance instance = compx.init();
----

Suppose, you want to mock Properties object and/or control it, use powermock

// Add @RunWith(PowerMockRunner.class) on class 
//Add @PrepareForTest({ ClassToBeMocked.class, Class_that_needs_instrumentation.class })
ComplexClass cls = EasyMock.createMock(ComplexClass.class);
PowerMock.expectNew(ComplexClass.class).andReturn(cls);
Or //PowerMock.createMockAndExpectNew(ComplexClass.class)

In above case, new instance in actual program will be injected with mocked object.

You have control now:

expect(cls.init).andReturn(instance) - similar operation on expectation

2. Static class – use mockStatic

MBeanServer mbs = createMock(MBeanServer.class);
PowerMock.mockStatic(ManagementFactory.class);
expect(ManagementFactory.getPlatformMBeanServer()).andReturn(mbs);

3. Static initializer in class

Ex: 
private static final String PROPERTIES_QUERY = QueryConfig.formatRepositorySQL("select propval from " + PROPS_TABLE_NAME +
 " where propkey = ? and podname = ?");

Better avoid mocking these calls using:

 @SuppressStaticInitializationFor({ "com.package.QueryConfig"}) // at class level.

4. Need to create an instance of a class, but do not want to invoke a constructor of that class

Whitebox.newInstance(ClassWithEvilConstructor.class)

5. Inject/get static variables in class

Whitebox.setInternalState(tested, "reportTemplateService", reportTemplateServiceMock);
Set<String> services = Whitebox.getInternalState(tested, "services");

6. Want to create a mock but need to suppress a constructor call:

PowerMock.suppress(MemberMatcher.constructor(NicerSingleton.class));

7. While doing expectation on a method, all attributes of that method should exactly be matched with caller method. Some time this is an issue if passing attributes are ‘objects’ and are created internally in program.

Mocking platform matches these objects with object id, hence object id should be same for all passing attributes.

Ex:  expect(mbs.invoke(ObjectName.class, String.class, Object[])).andReturn(xxxx);

In above method, you can not pass objects created in test program like

expect(mbs.invoke(mockObjectName, "abc", new Object[]{mockObject})).andReturn(xxxx);

If you are creating any new object in test case, it should be passed to real program so actual method will execute with same object Ids.

Else:

expect(mbs.invoke(isA(ObjectName.class), isA(String.class), isA(Object.class)).andReturn(mockCompositeData);

In case, you want to test parameters, use ‘cmp’ operator and compare both parameters:

expect(mbs.invoke(isA(ObjectName.class), isA(String.class), cmp(new Object[] { database }, new Comparator<Object[]>() {
      @Override
      public int compare(Object[] params1, Object[] params2) {
          if ((params1 != null) && (params2 != null)) {
                 assertEquals("Size of passing parameters should be same", params1.length, params2.length
                }
                    return 0;
                }
            }, LogicalOperator.EQUAL), .....)).andReturn(mockCompositeData);

Few Recently Visited Places-

I compiled this list for one of my friend who recently came to bay area and is looking for near by places.

We have visited lots of places (mostly near by, just to avoid hectic traveling with kid).

1. Oakland Zoo $$ – Oakland zoo offers annual membership via few discounts sites like Groupons. It is worth and membership is recoverable if you are going 2 times or 1 time with other 4 members family.

2. BigSur – A long misty coast line in california – Hwy 1 – A scenic drive

3. Japani Garden – Sanjose

4. Historical Museum – Sanjose – Free on weekdays – Japani garden is just near to it

5. Stanford University – See if you can get ‘cart’ booking – Daily tours are at 11 AM – Pls check web site.

6. SFO Visit – Golden Gate bridge

7. SFO – Just visit near sea (embarcadero road, pier 39) – Parking is $10-$12 for a day – Lots of activities can be done/seen here

8. SFO – Purchase Groupons/discount ticket for cruise

9. SFO – Golden Gate Park – For picnic, Free parking, Twin Peaks

10. SFO – China town and lots of internal visits – But needs a good walking

11. SFO – zoo $$

12. Santa Cruz – Boradwalk

13. Mystry spot, Santa cruz (not worth for money, but it is ok to go one time)

14. Sausilto – There is a good light house (Point Rayes) and beach near by

15. Lamon Farm – On the way to Half moon way, free entry – paid rides

16. Parks – There are lots of good parks here – Good for parents and small picnic – versona park ($6 parking), Black Berry cupertino, Memorial Park

17. Temples – Sirdi Sai, Fremont, Sunnyvale, Livermore (our favorite)

18. Lawrance Hall and Museum (Good if you can get free tickets, pls check website)

19. Shoreline Park, mountain view

20. Monetry Bay (Aquarium is Ok but costly, check if you can get offer), Denice park,

21. Half Moon Bay – Several beaches, Try go to ‘Hotel Ritz carlton’ (free parking in hotel and their beach side private area is free for public)

22. Palo alto Zoo (free and small)

23. Steven Creek Dam, cupertino (small, just for day picnic)

24. Yosemite (There are good offer in yosemite area for hotels, but a little expensive), Hatch-Hatchy Dam, Route 180 (if open) and few other places

25. Lake Tahoe (Distance is good, but good after winter)

26. MuirWood, Sausilto

27. Gilroy Garden (Good but $$$)

28. Napa Vally – You can puchase ticket for winery if like to drink (Also Sonama Vally)

29. Curioddesy, San mateo (museum is free for few days – Sundays – Photographic point)

30. Flee Market, Sanjose – Just time pass but need walking, check for events, Weekends $5 parking

31. Watch few xmas events like xmas in park, sanjose in late dec

32. Diwali festivals

33. WInter wonderland – Around Dec – Watch Groupons for cheap tickets

34. Few pumpkin patch (spina pumpkin patch is our favorite). – It is small farm with few rides. Good with kids.

35. There are several museum, constantly provide free/discounted entry like SanJose Tech museum, Hiller aviation, Children Discovery Museum Sanjose,  Discovery Museum, Sausilto, Repley Believe it or not, SFO

SIGAR – Access operating system and hardware level information

SIGAR (System Information Gatherer and Reporter) provides efficient way to access information of underneath hardware/operating system.

Information of Sigar is available here:

http://www.hyperic.com/products/sigar

I am just touching few areas while working with Sigar.

1. Testing independently

After downloading sigar bundle, you want to explore posssiblities of Sigar. Best way to use command line program on available hardware/operating system:

java -jar sigar.jar

It will allow following prompt for further command. Type help for detail of commands.

Sigar>

2. Configure project with Maven:

Sigar has two sets of files -> to be downloaded from maven repository:

a. Sigar jar that can be used in Java program to execute respective command

b. Zip for operating system and hardware based JNI files. These files are used runtime to interact with operating system.

For that, you need to add following dependencies in your pom.xml file.

 <dependency>

<groupId>org.hyperic</groupId>

<artifactId>sigar</artifactId>

<version>${sigar.version}</version>

</dependency>

<dependency>

<groupId>org.hyperic</groupId>

<artifactId>sigar-dist</artifactId>

<version>${sigar.version}</version>

<type>zip</type>

</dependency>

Sigar version is ‘available version’ to be downloaded.

Now, to read sigar files, you need to unzip downloaded sigar-dist.zip to defined folder (preferably folder in class-path, else update your class-path).

Add maven plugin to unzip all files:

 <plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-dependency-plugin</artifactId>

<version>2.2</version>

<executions>

<execution>

<id>unpack-dependencies</id>

<phase>compile</phase>

<goals>

<goal>unpack-dependencies</goal>

</goals>

<configuration>

<includes>**/sigar-bin/lib/*</includes>

<excludes>**/sigar-bin/lib/*jar</excludes>

<includeGroupIds>org.hyperic</includeGroupIds>

<includeArtifactIds>sigar-dist</includeArtifactIds>

<outputDirectory>

${project.build.directory}/depends

</outputDirectory>

</configuration>

</execution>

</executions>

</plugin>

In above configuration, choose your ‘output’ directory where you want to copy all JNI based files.

Internally Sigar API is using ‘java.library.path‘ to load Sigar libs/jars. All your JNIs should be available in same path.

<plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-surefire-plugin</artifactId>

<version>2.8</version>

<configuration>

<systemProperties>

<property>

<name>java.library.path</name>

<value>${project.build.directory}/depends/hyperic-sigar-1.6.5/sigar-bin/lib</value>

</property>

</systemProperties>

</configuration>

</plugin>

3. Assembly it for different hardware

For assembly. update your assembly.xml (or respective file):

Step 1: Include Sigar Jar

<include>org.hyperic:sigar</include>

Step 2: Include all system dependent JNI file. Ex. I am adding all linux based files in this build.

<fileSet>

<directory>${project.build.directory}/sigar/hyperic-sigar-1.6.5/sigar-bin/lib</directory>

<outputDirectory>/lib/sigar/</outputDirectory>

<includes>

<include>**/*-linux.so</include>

</includes>

</fileSet>

4. Program with it

Sigar APIs can be used in 2 ways:

a. Implement the SigarProxy interface which provides caching at the Java level – It is helpful if you want to cache fetched hardware information for some time to avoid unnecessary calling hardware for each request. It depends on application use case like static/fix load pattern where chances of alarming change of hardware parameters are not significant.

b. Direct access without cache – Create instance of Sigar class.

Sigar sigar = new Sigar();

final Mem mem = sigar.getMem(); // Memory

CpuPerc cpuPerc = sigar.getCpuPerc(); // Percentage CPU

Swap swap = sigar.getSwap(); // Swap Space

FileSystemUsage fsu = sigar.getFileSystemUsage(“Directory for File system”); //  File system – “.” for existing directory

There are several method available in Each class. These methods can be used as per requirement.

Ex: mem.getTotal(), mem.getActualUsed(), fsu.getTotal(), cpuPerc.getIdle(), swap.getTotal()

Sigar provides PTQL (Process Table Query Language) – PTQL is specially useful to identify process like we are using PID to record/identify Unix process.

All possible command, combination and regular expression information is provided here.

 Ex: I want to know number of servers running on a host-

Sigar sigar = new Sigar();

final ProcessFinder processFinder = new ProcessFinder(sigar);

final long[] pids = processFinder.find(“State.Name.eq=java,Args.-1.ct=myserver”);

System.out.println(“Total number of server on this host: “+ pids.length)

4. Test it

For testing of Sigar based APIs, make sure you have configured sure-fire plugin as given in step 2.

You can simply do mock for Sigar APIs to avoid making JNI calls to hardware:

private final long totalMemory = 1000;

final Sigar sigar = createMock(Sigar.class);

// Use any mocking platform like easy mock, powermock.

PowerMock.expectNew(Sigar.class).andReturn(sigar);

final Mem mem = createMock(Mem.class);

expect(sigar.getMem()).andReturn(mem);

expect(mem.getTotal()).andReturn(totalMemory);

replay(…)

// System In test – Call method to be tested

mySigarData = methodInCall();

// Asset it

assertEquals(“Total memory is not correct.”, totalMemory, mySigarData.getTotalMemory() );

References:

https://support.hyperic.com/display/SIGAR/Home

https://support.hyperic.com/display/SIGAR/PTQL

Follow

Get every new post delivered to your Inbox.