Google_Summer_of_Code_2009/Chinmoy

Student Name:  	 Chinmoy Bhatiya
Mentor Name: 	Peng-Liang Wang
Title: 	Phylogenetic tree plugin for Cytoscape
Abstract: 	Proposed here is a plan for a plugin for Cytoscape which takes in a phlyogenetic tree as
input and displays it in Cytoscape. Several of Cytoscape’s original features are supported in this plugin,
such as the ability to manipulate nodes, zooming and exporting a view of the network. New functions that
would be beneficial to analyzing phylogenetic trees are proposed such as the ability to collapse subtrees,
swap subtrees with the same ancestors and display included information about nodes in the tree.

Overview


Coordination

E-mail: chinmoy.bhatiya@… Blog:  http://phyloscape.blogspot.com

Scheduled meetings

Every Friday 3:00PM PST

Weekly reports

Every Thursday night/Friday morning

22nd May, 2009
- Obtained Eclipse and set it up for development of Cytoscape plugins
- Obtained Cytoscape source code
- Compiled and ran sample plugins from tutorial in Cytoscape. Focussed on plugin to construct a new network, plugin to implement layout algorithm. I think these will be the two biggest steps involved in the phylogenetic tree plugin: constructing a network from XML and then changing layouts as required.
- Ran into some difficulties getting plugins to run. I forgot to compile the plugin into a jar file to place in the plugins folder. I was attempting to run the plugin from Eclipse directly by invoking the Cytoscape main function. - Started looking at ways to read XML into Java. Found a couple of useful links
( http://www.developerfusion.com/code/2064/a-simple-way-to-read-an-xml-file-in-java/). Will be working on a plugin to construct a network directly from the XML file this week.

29th May 2009

1. Researched parsing phyloXML files
- phyloXML stores phylogenetic trees in the XML format
- the tag <clade> is used to represent each node
- edges are not explicitly represented. a clade within a clade indicates an edge between the parent clade and the child
- the phylogenetic tree can be parsed into a tree object simply by parsing the XML tree
- information about a clade (such as branch length etc) are stored as children of the clade node in the tree object
- a CyNetwork can be built from the tree object, and children that represent information about the clade can be stored as attributes of the node instead

2. Switched to researching parsing of Phylip format files using BioJava

- Biojava phylogeny package uses Phylip format to store sequences and alignments, not phylogenetic trees
- the Nexus format is used for phylogenetic trees which is very similar to the Phylip format
- obtained the biojava phylogeny package and currently researching how nexus format phylogenetic trees are parsed
- it appears that a text parser is used that pushes each character onto a stack until a ) paranthesis is encountered. Once this happens, each character is popped off the stack until a ( paranthesis is encountered. The characters between the parenthesis represent a node

6th June, 2009
- I have developed a parser that takes in as input a phylogenetic tree as a PHYLIP format string.
Step 1: Take in a string and parses it into a list, separating the string into node names, branch lengths and parentheses. For example, if the input it (B:3.4,(A:2.9,C:1.5,E:4.3), D:2.2), the resulting list would be [(, B, 3.4, (, A, 2.9, C, 1.5, E, 4.3, ), D, 2.2, ) ]
Step 2: Parse the list into a CyNetwork

  • Iterate through the list, pushing each element onto a stack
  • Every time a ')' is encountered, start popping elements off the stack till a '(' is encountered. Replace the nodes between '(' and ')' with a parentNode
  • Try using the parseDouble() function to convert the popped string into a double. If it fails, create a new CyNode and add it to the network.
  • If parseDouble() succeeds, the double becomes the branch length attribute of the CyNode
  • Create a CyEdge connecting the created CyNodes from to the tempNode.
  • Look at the next element in the list

12th June, 2009
Coding:
- The actual parsing is occurring similar to last week's version
- I created a wrapper for the plugin so that it could be run from the Cytoscape menu.
- It first prompts the user to select a file in the Phylip/Nexus format, then parses it and creates a network, which can be viewed.
- When the wrapper is evoked, it creates a Parser object and runs its parse() method. Nodes and edges can be retrieved from the Parser methods getNodeList( ) and getEdgeList( ). These lists are then used by the wrapper to create the actual network.

Testing:
- I tested the changes made to the parser using the trees obtained from a link that I had received earlier.
- To test if the created network was correct, I ran the same files through an application called Archaeopteryx which displays phylogenetic trees. The output produced by this application on each of the test files was similar to that produced by the Cytoscape plugin (except for the layout of course).
- PNG files as well as the tree files are available in my repository

SVN:
- Finally managed to set up a repository. The 'phylotree' folder has all the code.
- The testing folder has the PNGs and the tree files

19th June, 2009

JUnit:

- Read up on unit testing paradigms
- Read up on documentation for Junit
- Looked at the Test stub that Peng had created
- Tweaked the stub and created a testSuite for it
- Need to add more tests

Interfaces:

- The stub code really helped me understand what was required. I read up about how code should be organized and I clearly understand the workflow now

Code:
- Updated the stub code with working code

The only problem I have had this week has been with build files. I understand how they are supposed to work but I encountered some problems creating my own one.

26th June, 2009
- Complete functionality of getNodeList(), getEdges() and getEdgeAttribute()

In order to implement getEdges() for a node, I added a private list of edges associated with each node. getEdges() returns this list of edges
getEdgeAttributes() is implemented to return a list of Objects. The first element in this list is the branch length. I implemented it this way so that other implementations of the parser interface can return other edge attributes as well.

- Junit Test cases for these functions

I tested getNodeList(), getEdges() and getEdgeAttributes(), basically by testing the number of nodes, the number of edges for a select few nodes and their expected branch lengths.

- Added comments to explain how some pieces of code work

- Split up the parser into two smaller methods to simplify the workflow. I don't know if I should split up these methods any further, they are pretty simple to understand.

- Added sample layout plugin worked on during earlier weeks, complete with functioning build file.

30th June, 2009

- I updated the ImportTreeFromFile method so that the parser can be used from the GUI. The file gets parsed, the network created and the attributes assigned to the edges

- Added more test cases for JUnit testing


Project

Timeline

Acceptance - May 23rd: Get acquainted with source code, mentors and Cytoscape 3. Research similar algorithms such as DRAWTREE.

May 23rd - Mid-term evaluations: Implement skeletons for all planned features (see design ideas).

Mid-term evaluations - August 10th: Complete functionality of all added features. Final testing and cleaning up code. Complete documentation.

Updated TIMELINE
] Week 1:
Research and prep-work before coding begins

Week 2:
Parser for phyloXML, research for PHYLIP

Week 3:
Implement parsing algorithm for PHYLIP

Week 4:
SVN cleanup and package creation for parser

Week 5:
Junit Testing and edge attribute parsing

Week 6:
UI, network creation from parsed file

Week 7:
Implement basic layout algorithm for network

Week 8:
Finish layout algorithm for other types of dendograms (more time may be required)

Week 9:
Implement collapsing subtrees

Week 10:
Implement subtree swapping, constant branch length

Design Ideas

Functionality to be supported:

- Toggling branch length proportional to the evolutionary distance/alignment score between ancestor and node.

- Zooming in and out of the dendogram.

- Rotating the dendogram.

- Moving nodes around with restricted branch length if.

- Exporting an image of the dendogram at set zoom, rotation and position.

- Uncollapse/Collapse subtrees.

- Coloring subtrees.

- Swapping subtrees/nodes with same ancestors.

- Choosing what to display as the label of a node (name, ID, function, confidence values).

Support via VizMapper:

- Toggling display of labels, edge lengths.

- Changing font size, thickness of edges, and size of nodes.

- Toggling visibility/opacity of nodes.

Implementation Plan

1. Implement reading tree data into a CyNetwork from XML file.

  • Create a new network, parse XML file and add nodes and edges to the network accordingly.

2. Implement a set of CyLayout algorithms that display the CyNetwork as a dendogram. Need a separate CyLayout algorithm for each dendogram type, allow users to pick layout algorithm from the menu.

  • Compute (x,y) co-ordinate for each node.
  • x co-ordinate for all other nodes = max x co-ordinate of children + constant a.
  • y co-ordinate for leaves = constant b * number of leaves visited before it
  • y co-ordinate for other nodes = mean of y co-ordinates of all its children
  • To improve display of reticulate nodes (nodes connected to multiple subtrees), add edge to least single ancestor and remove reticulate edges.
  • Change y co-ordinate of reticulate node to order it between the subtrees it was connected to via reticulate edges.

3. Add-in compatibility for features already supported by Cytoscape (toggle labels, edge lengths, VizMapper functions, exporting images).

4. Add-in features that Cytoscape doesn’t support yet:

  • Collapsing subtrees: Make all edges and nodes in subtree invisible sequentially starting at leaf level and moving up to level below the root of the subtree.
  • Uncollapsing subtrees: Make all edges and nodes in subtree visible sequentially, starting at level below root of the subtree and moving down to the leaves.
  • Swap subtrees/nodes with same ancestor: Swap co-ordinates of all nodes in one subtree with those of the other subtree and redraw the subtree rooted at their common ancestor.
  • Choice of label: Replace/append current node label with fields chosen by the user.
  • Color subtree: Color each edge in the subtree (Might be supported by Cytoscape already).
  • Branch length to reflect evolutionary distance: Replace each edge in selected subtree with a new edge of length equal to a factor of its evolutionary distance from the ancestor.
  • Constant branch lengths: Replace each edge in selected subtree with a new edge of constant length.
    • Restricting edge lengths would be a question of disabling manipulation of the node X co-ordinate
      (in a horizontal cladogram for example). Just getXPosition for a node and setXPosition to the same value
      every time a user attempts to move the node. Allow the YPosition to be based on user input (allowing the
      node to be dragged any which way without changing its edge length.)