#14 GSoC’18 – Wrapping up

The summer is near the end so this week was wrapping up.

Changing JUnit tests to add plot support:

In one of the early blog posts, I talked about how I added plot support in SBSCL using JFreeChart. I added a new plotting class called PlotProcessedSedmlResults to support the output of SED-ML file. This output object is a list of data generators wrapped using IProcessedRawSimulations object. Using this additional plot support class the JUnit tests were modified to generate output graph for all the existing SED-ML examples as this is a great way to verify if my feature implementations were correct. Most of the SED-ML specification test cases work nicely. A few that don’t have their detailed discussion in a recent Github issue and they will make for a great GSoC project next year. For example, adding a stochastic ODE solver to the library.

The file pom.xml was modified to add the following plugins:

maven-project-info-reports-plugin
maven-site-plugin
maven-javadoc-plugin
license-maven-plugin

Making library website using maven:

While code clean-up, we realized that we can use Maven for a lot more than just building code. One of the interesting features offered by maven was building a website for the API.

mvn site

Using this command, Maven can automatically take all the project information from pom.xml and generate a nice website. This lead to the second round of major project restructuring (first round was discussed here). The contents of doc folder were either deleted or moved to src/site as that information is helpful to maven when making a website. A markup.md file was added to src/site/markdown and this is now the landing page of our website: https://shalinshah1993.github.io/SBSCL/

Note that a folder copy plugin was used to move target/site/ to docs/ folder so that the maven generated wesbite is pointed by github. This means we don’t need gh-pages branch anymore.

 Generating javadoc using maven:

In addition to making the website, javadoc maven plugin was used to generate javadoc and add them to the website. This is highly desired as we have online javadocs. Prior to this the generated javadoc resided in doc/ folder and were only accessible offline after downloading the library.

Generating a license file using maven:

The license plugin by maven was used to generate software license. This means that we don’t need the licenses/ folder originally residing in the repo. To see a list of 3rd party licenses following commands can be used:

mvn license:download-licenses
mvn license:license-list -Ddetail
mvn license:add-third-party

The generated license file THIRD-PARTY.txt resides in target/generated-sources/ folder.

Updating README.md file:

All the folder restructuring means the original README.md file also had to be updated. Several changes were made to this file and the updated directory information is as under:

 /
 |- docs           -> Contains code for the maven built website
 |- src            -> The main source folder containing all the code and test files
    |- assembly    -> assembly files for maven plugins
    |- lib         -> 3rd party libraries needed for compilation and execution
    |- main        -> Core java files of simulation library
    |- test        -> JUnit test files along with resources required
    |- site        -> Contains markup files, old javadoc, site.xml and other website 
                      resources 
 |- LICENSE.txt    -> the license, under which this project is distributed
 |- pom.xml        -> Maven file for building the project
 |- README.md      -> this file

This week the coding phase for GSoC come to an end. All the students will be evaluated next week one last time by their mentors and organizations, and then the program ends. Good luck to all other students and myself for the final evaluations.

I had a great summer working with Andreas, Matthias, Niko, and others. I will try to keep contributing to NRNB and other open-source communities whenever I can.

Adios!

 

Advertisements

#13 GSoC’18 – OMEX archive support

Since the entire Systems Biology community is moving towards supporting OMEX files, one of the features I proposed was to add support to read OMEX files in SBSCL. OMEX files are archives which contain all the necessary information required to run a simulation. The official definition of OMEX files described in the original publications is:

OMEX is the basis of the COMBINE Archive, a single file that supports the exchange of all the information necessary for a modeling and simulation experiment in biology. An OMEX file is a ZIP container that includes a manifest file, listing the content of the archive, an optional metadata file adding information about the archive and its content, and the files describing the model.

Adding OMEX support:

There are several Java libraries available to work with the omex archive. After discussing with my mentors, we decided to work with CombineArchive SEMS package. Using this package, I can read an omex file, extract the required information from the metadata file such as the location of SED-ML file, SBML model etc. Once I have the location for SED-ML and SBML, the rest of the process is simply handled by existing routines of the simulation library. A simple example code to do this should look as follows:

// read omex archive from location file
OMEXArchive archive = new OMEXArchive(new File(file));
if(archive.containsSBMLModel() && archive.containsSEDMLDescp()) {
// Execute SED-ML file and run simulations
	SEDMLDocument doc = Libsedml.readDocument(archive.getSEDMLDescp());
	SedML sedml = doc.getSedMLModel();

	Output wanted = sedml.getOutputs().get(0);
	SedMLSBMLSimulatorExecutor exe = new SedMLSBMLSimulatorExecutor(sedml, wanted, null);

	Map<AbstractTask, IRawSedmlSimulationResults> res = exe.runSimulations();
	if ((res == null) || res.isEmpty() || !exe.isExecuted()) {
		fail ("Simulatation failed: " + exe.getFailureMessages().get(0).getMessage());
	}
			
	for (IRawSedmlSimulationResults re: res.values()) {
		// do something with results
	}
}
		
// close the file object
archive.close();

In addition to a simple example, a JUnit test to check the feature was also added in src/test/java.

Let me know if any questions.

 

#12 GSoC’18 – Finding closet simulation algorithm

I slightly moved away from the proposed work this week. Currently, SBSCL supports following simulation algorithms:

final static String[] SUPPORTED_KISAO_IDS = new String[] {

  "KISAO:0000033",  // Rosenbrock method
  "KISAO:0000030",  // Euler forward method
  "KISAO:0000087",  // Dormand-Prince method
  "KISAO:0000088",  // LSODA
  "KISAO:0000019"   // CVODE
};

Although these simulation techniques cover a wide array, they still don’t cover all of them. For example, the SED-ML test file parameter-scan-2D given in the official SED-ML specification requires KIASO:0000282 which is not supported by simulation core.

This week’s work involved finding the closest algorithm match when the desired algorithm is not supported. To do this, I am using libKiSAO to find the algorithm with minimal distance in the KiSAO ontology tree. A short code snippet to create a query looks as under:

IKiSAOQueryMaker kisaoQuery = new KiSAOQueryMaker();
IRI wanted = kisaoQuery.searchById(kisaoID);
IRI simAlgo = kisaoQuery.searchById(SUPPORTED_KISAO_IDS[0]);

// Find the closest available algorithm to what is asked
for (String supported : SUPPORTED_KISAO_IDS) {
	IRI offered = kisaoQuery.searchById(supported);
	double curDist = kisaoQuery.distance(wanted, offered);

	if (curDist < minDist) {
		minDist = curDist;
		simAlgo = offered;
	}
	sim.setAlgorithm(new Algorithm(kisaoQuery.getId(simAlgo)));

}

 

Let me know if any questions.

#11 GSoC’18 – Post-processing output results

Last time, I talked about several changes implemented as part of incorporating repeated tasks in simulation core. The final aspect of this implementation is post-processing raw data returned by tasks using data generators so they can be displayed using output.

Post-processing sed-ml results:

Since the output of task execution is changed to return a List<IRawSimulationResults>, the post-processing had to be modified as well. The input to post-processing is a results list so each entry of the list has to be processed and concatenated, similar to some existing tools such as Tellurium. This means to process MathML of each result of an abstract task, and if the result contains more than one entry, merge each post-processed entry to generate one final data generator list.

Iterate over all the data generator specifications and do the following:

  • Map variables and parameters with their IDs to resolve MathML.
  • Resolve each identifier of MathML.
  • Generate new post-processed data by solving the equations described using MathML.

Interface with the library:

After complete implementation of repeated tasks and their post-processing, the next question is how to use it? A simple sample code which can take in the SED-ML file along with the corresponding SBML file to execute simulation can be found here. To execute the simulation and post-process results, run the following lines:

SedMLSBMLSimulatorExecutor exe = new SedMLSBMLSimulatorExecutor(sedml, wanted, sedmlDir);

...

Map<AbstractTask, List<IRawSedmlSimulationResults>> res = exe.run();

...

IProcessedSedMLSimulationResults prRes = exe.processSimulationResults(wanted, res);

Here, the wanted keyword is the desired output specified in the Output element of SED-ML file, sedml is the object which contains parsed sedml tree and res contain a mapping between each task and list of results.

#10 GSoC’18 – Second evaluation

This week has been a wrap up for second evaluations which are due next week. This means rigorous testing and finding issues. I have already been working on repeated tasks for several weeks passively and this is mainly because of the amount of work it required.

In one of my past posts, I described what are repeated tasks and what are the challenges that had to be addressed for successfully implementing repeated tasks.

Implementation of different simulation types:

As per the SED-ML specifications, there are three different ways to run a simulation. (a) UniformTimeCourse, (b) OneStep, and (c) SteadyState. UniformTimeCourse simulation specifies the start and end time for simulations along with step size which can be both linear and logarithmic. This was already implemented in the simulation-core library, however, there was a weird bug about the total number output points. The OneStep simulation method was implemented by running a UniformTimeCourse for just 2 steps which is a start point and end point. The steady-state simulation implementation simply checks for the change in the solver output until the change is below a certain tolerance. Every time the output doesn’t converge, the total simulation time gets increased logarithmically.

Implementation of repeated tasks:

Briefly, repeated tasks is a looping construct provided by SED-ML to run a set of sub-tasks repeatedly. The implementation is divided into 3 parts:

a) Sort the sub-tasks within one repeated tasks

b) Reset model, if required, and apply any required changes

c) Run all the sub-tasks in sorted order over the range specified

The current implementation does all of this and combines all sub-task results in a huge hashmap which maps each abstract task with the corresponding results list. Note that the simulation library doesn’t support nested repeated tasks currently. It is outside the scope of this GSoC project.

Post-processing sed-ml results:

One of the reasons why the implementation of the repeated task too long is because of the changes required for the downstream code. Once I implemented the repeated tasks, the simulation output changed from Map<AbstractTask, IRawSimulationResults> to Map<AbstractTask, List<IRawSimulationResults>>. This means that the raw results processing by the data generator has to be changed.

More on the results post-processing and data generators next week.

 

#9 GSoC’18 – Support for the comp package

SBML level 3 supports several packages for added modeling functionality. Some examples are arrays, hierarchical model composition (comp), flux balance constraints etc. A detailed list of all the available packages, along with their mailing lists and discussion forum, for SBML level 3 can be found here.

This week’s work mainly focused on adding support for hierarchical model composition package. The official description of comp package is as follows:

A means for defining how a model is composed from other models

The comp package is a powerful tool for model representation. It allows the nesting of submodels within a model. The complete specification of all the SBML elements can be found in their latest specification.

How to deal with hierarchical SBML models?

As per SBML experts, the simplest way to deal with submodels is to flatten them. A complete description of the flattening strategy can be found here. In my case, I don’t need to flatten the models myself since it has already been implemented by Christoph Blessing as part of the jSBML library. My goal, instead, is the reverse mapping i.e. mapping flattened submodels back to their original model id.

How to map submodels to original id?

jSBML library can parse SBML files. After it reads an SBML file, a tree-like object is constructed which can be recursively parsed to find a child element. Originally, I wrote a very clumsy code however after some guidance from my mentor Andreas, we have a short class AddMetaInfo.java with static methods which is called before a model is flattened. It adds some meta information about submodel id’s in a HashMap which can later be retrieved in an efficient fashion.

A simple example to do it will be as follows: Read an SBML file which contains submodels and store meta-information about original id’s.

// Read original SBML file and add meta-info about original ID
SBMLDocument origDoc = SBMLReader.read(file);
origDoc = AddMetaInfo.putOrigId(origDoc);

Flatten the model and solve individual models using one of the solvers available in SBSCL.

// Flatten the model extra information in userObjects
 CompFlatteningConverter compFlatteningConverter = new CompFlatteningConverter();
 SBMLDocument flatDoc = compFlatteningConverter.flatten(origDoc);

/* call SBSCL's solvers for solving flattened models */

After running solvers simply map submodels to original ids.

// Map the output ids back to the original model

AbstractTreeNode node = (AbstractTreeNode) origDoc.getElementBySId(solution.getColumnIdentifier(index));

if(node.isSetUserObjects()) {
System.out.println("flat id: " + solution.getColumnIdentifier(index) + "\t old id:" + node.getUserObject(AddMetaInfo.ORIG_ID) + "\t model enclosing it: " +  node.getUserObject(AddMetaInfo.MODEL_ID));
}

That’s all folks. The full example can be found on GitHub.

Let me know if any issues or questions.

#8 GSoC’18 – Removing CPLEX dependency

This week’s work focused on removing cplex dependency. SBSCL’s current version depends on IBM ILog CPLEX library to solve constraint-based models. These models are represented using a linear equation along with a few constraints.

There is an alternative open source solver SCPsolver which can solve the same problem. In fact, it can work with multiple linear programming libraries, namely, LPSolve, IBM cplex and GlpkSolver. Therefore, this week was mainly working on COBRASolver.java file to replace all the routines that call IBM cplex with SCPsolver.

A known issue for SCPsolver is that it doesn’t work with 64-bit Linux machine as seen here and here.

A simple test file was added to check the implementation of SCPsolver inside org/simulator/fba folder. CobraSolverTest is used to run ecoli_core.xml model while BiGGTest can download and run all the constraint-based models available in the databased here.

Let me know if any questions.