Google_Summer_of_Code_2009/Srinivas

Student Name:  	 Srinivasarao Vundavalli
Mentor Name: 	Maital Ashkenazi
Title: 	Integrated data mining in Cytoscape 3.0
Abstract:   Cytoscape is an open-source platform for complex network analysis and visualization. Filters 
and Search are part of analysis feature. Filters are currently based on Quick Find plugin. For 
searching it has two plugins Quick Find and Enhanced Search Plugin (ESP). I will be integrating 
both Quick Find and ESP which when combined can offer a faster and efficient indexing and searching.
The Filters interface will be modified to perform complex queries. Auto completion can also be implemented. 

Overview


Coordination

Scheduled meetings

9:00 am - 11:00 am California == 19:00 pm - 21:00 pm Israel == 21:30 pm - 23:30 pm India

Weekly reports

5th May - 12th May

After discussion with Maital and Peng I have created three new bundles search-api, search-impl and search-ui according to the instructions given by them. I explored the source code of QuickFind and EnhancedSearchPlugin to get familiar with the code.

13th May - 19th May

I started with EnhancedSearchPlugin code. I copied the source code of ESP into the three bundles separating the api, implementation and UI. I am trying to resolve dependency problems. As many of the classes and functions from Cytoscape 2 are renamed and/or modified, I am getting some errors. I am trying to fix these errors using the suggestions on this page  http://cytoscape.org/cgi-bin/moin.cgi/TipsOnPortingTo3.0 .

20th May - 27th May

As I mentioned in the last week, I got around 120 errors after starting with ESP. I reduced the errors to 54 by modifying the ESP code according to the latest changes in Cytoscape 3.

3rd June - 9th June

I modified ESP so that it can work with Cytoscape3. I checked in the code to  http://chianti.ucsd.edu/svn/csplugins/trunk/soc. My code can be found here  http://chianti.ucsd.edu/svn/csplugins/trunk/soc/srinivasarao . I tested the code by creating a sample network by adding nodes, edges and attributes.

10th June - 16th June

I have separated the api and implementation during this week. I created interfaces for classes like EnhancedSearchIndex, EnhancedSearchQuery, ReindexTask etc. There is a problem with EnhancedSearchFactory class. It contains a static method getGlobalEnhancedSearchInstance which will return instance of EnhancedSearch class. As an interface cannot contain static methods, I couldn't keep it as a static method. Instead I kept a static variable of EnhancedSearch in the EnhancedSearchImpl class, so that only one instance of the EnhancedSearch class will be created. However, to get that instance, an instance of EnhancedSearchFactory class should be created.

17th June - 23th June

Break

24th June - 30th June

I have faced some problems while importing the search-api package in the search-impl. When I tried to run mvn install on search-api, it compiled all the classes and created jar, but didnot deploy it in the maven repository. After fiddling with settings in pom.xml file and osgi.bnd file, I was able to fix it.

1st July - 7th July

I have worked on designing the GUI for the filter panel. I looked at filter panel used in spotfire and also MCODE plugin in cytoscape to design the UI. I designed a basic draft of the UI. After discussions with the mentor I will modify it and send it to the cytoscape mailing list for feedback. After finalizing the UI I will start implementing the UI.

8th July - 14th July

I have written unit tests for all the functions in search-api, search-impl modules. While testing I have identified some problems with my implementation of these bundles. After discussions with my mentors I have solved them. I have made a draft of the UI and sent it to cytoscape user mailing list for suggestions. I have started building the search-ui module. Presently I have built a Panel with a simple search box and search button. As the attribute browser module for Cytoscape3 is not completed yet, I can't integrate the module in cytoscape3 to test the functionality. So presently I have set it up such that when I give queries in the search box, I can see the results in the console. Once the attribute browser module is completed, I will be able to integrate and test the UI.

15th July - 21th July

While testing I identified with CustomMultiFieldQueryParser for queries like 900 where field won't be specified in the query. The number of results returned is less than the expected number. This is due to a change in MultiFieldQueryParser in lucene 2.4 and as we use this to parse the query, the results are less for some queries. This problem occurs only when the field is null. To solve that I have handled the case when field is null in CustomMultiFieldQueryParser and rectified it. I implemented the interface of the UI draft we finalized last week. I have written event listeners for some of the functionalities. Due to doubts in some of the functionalities they haven't been implemented yet. I got my doubts clarified after discussions with mentors. This week I will implement the remaining functionalities and integrate this UI and test it.

22nd July - 28th July

I implemented the GUI with all the functionalities except for history. I have written the tests for all the attribute panels and the main search panel. I have implemented the draggable panel which is used to drag the attribute panels to change their order of appearance in the main panel. If the user clicks in the space between label and any checkbox or just above the label the panel can be dragged. When the user moves a panel around it is immediately reflected in the query box also. If the user needs to expand/collapse a specific attribute panel, he can do so by clicking on that attribute label. This week I am planning to implement the history part and the topology filter and test them.

29th July - 4th August

This week I worked on implementing the history part and integrating the UI in the cytoscape main UI. I have implemented the search history. I have modified the search field as a combo box which displays the history of the user in a session. But I am unable to integrate the UI into cytoscape 3. After discussion with my mentors I will integrate it and test it. After integrating the UI into cytoscape 3, I will also save the history whenever the user saves a session.

5th August - 11th August

This week I worked on event handling. I have added event handling for the following 6 events and handled them as explained below. I am unable to decide what should be done in case of some of the events(Events for which description is blank). Since session handling hasn't been implemented in Cytoscape 3, there is no provision for saving a session and loading a session. So I am unable to receive those events. After discussion with my mentors I will complete the handling of all these events.

1) NetworkAddedEvent

When a new network is added I have indexed it by calling addNetwork function in EnhancedSearch class.

2) NetworkViewAddedEvent

3) SetCurrentNetworkEvent

Enabled the SearchPanel, Cleared the attribute panel and loaded the new attributes of the current network.

4) SetCurrentNetworkViewEvent

5) NetworkAboutToBeDestroyedEvent

I have removed the network Index from the EnhancedSeach class using removeNetwork function and disabled the SearchPanel.

6) NetworkViewAboutToBeDestroyedEvent

If the networkViewAboutToBeDestroyed is the current network's view I just disabled the SearchPanel.

I have also started working on topology filter panel. I hope I will be able to complete it 2 days.

12th August - 19th August

I have completed implementing the Topology Filter, Edge interaction Filter, Node Interaction Filter. I have also made some minor changes to the interface. I have also tested all the functionalities using a sample network.


Project

The main aim of this project is to produce multiple OSGi bundles. They are

  1. QuickFind_API – a bundle for definition of searching APIs.
  2. QuickFind_impl – implementation of the APIs, do indexing and searching.
  3. Filters – Implementation of UIs, including the boxes of QuickFind and Enhanced Search, and filters. With the UIs, user will be able to construct searching query and pass the query to QuickFind.

Design Ideas

Implementation Plan

Integration of data mining modules:

Both Quick Find and Enhanced Search Plugin (ESP) are providing similar functionalities. With OSGi as we can separate interface and implementation, Search Service can be an interface with Quick Find and Enhanced Search Plugin being two different implementations of the interface. But if we have both the search techniques, it’s simply a waste of memory since both these plugins load their indexes in the memory. So, if we can integrate these two plugins, it will offer a much faster and efficient indexing and searching.

As lucene ( http://lucene.apache.org) offers fast indexing and searching, it will be a good idea to implement these modules using lucene. The current ESP uses RAMDirectory of lucene to index a network. The new version of lucene provides much faster searching, compared to previous versions, using a new RAM-based index called InstantiatedIndex. ( http://issues.apache.org/jira/browse/LUCENE-550). InstantiatedIndex is an all-in-memory index store implementation that delivers search results up to a 100 times faster than the file-centric RAMDirectory at the cost of greater RAM consumption. I am planning to use InstantiatedIndex for indexing the networks.

Different functionalities that can be provided in this API are.

  • Adding a network for indexing
  • Removing a network index
  • Get a network index
  • Re-index a network
  • Search a network index

API:

1) void addNetwork(CyNetwork network)
2) void removeNetwork(CyNetwork network)
3) InstantiatedIndex getIndex(CyNetwork network)
4) InstantiatedIndex reIndex(CyNetwork network)
5) ArrayList<String> searchNetwork(String queryString) Returns an ArrayList containing Identifiers of the nodes and edges that match the given query.
6) void addSubNetwork(CyNetwork subnetwork)
7) void removeSubNetwork(CyNetwork subnetwork)
8) ArrayList<String> searchSubNetwork(String queryString) Returns an ArrayList containing Identifiers of the nodes and edges that match the given query.

Indexing and Searching Subnetworks:

1) Subnetwork can be treated as a new network and it can be indexed and searched just like the original network.
2) When the IndexsubNetwork function is called, I will modify the index of the main network in the following way. I will add a new field ("subId") to the document which contains the id of the subnetwork indicating that the document representing a node or edge belongs to that subnetwork.
3)For searching, the query will be modified by adding a lucene term subId:<id of the subnetwork> to the query and then search using the searchNetwork function.

As a part of implementation, I consider each node and edge as a lucene document and their attributes as the document field values. I will implement this as a different module so that the other modules such as filters can access this module. A given network will be indexed using InstantiatedIndex for faster searching. The query will be parsed using lucene query parser syntax.  http://lucene.apache.org/java/2_4_1/queryparsersyntax.html.

Interface Changes:

With the present interface of filters, it is a bit difficult to enter complex queries. For ex, if a user wants to enter a query of type ‘(A AND B) OR C’, with the present filters interface, he has to first create a filter X for ‘A AND B’ and then create another filter for ‘X OR C’. This can be modified such that the user can easily enter the queries of above type using just one filter.

After the user completes creating a query using the filter, he will be shown the resultant query so that the user can learn how to enter the advanced queries directly in the search box, over a period of time.

I am planning to implement the following interface for the searching.

The main window contains a search box like the one in present version which accepts Boolean queries like the ESP. There will be an icon beside the box which opens a new small window which looks like the following

http://research.iiit.ac.in/~srinivasarao/search_box.png

Box1 is an editable box showing the query. The ~ box before the box1 is a check box which applies a NOT operation on the final query. Box2 shows a drop down list of all possible attributes like node.attrName, edge.attrName to be searched on. The ~ box between box2 and box3 is a check box which applies a NOT operation on the attribute value when checked. When the user selects an attribute in Box2, Box3 shows a drop down list of all possible values of that attribute. Box4 shows a drop down list containing AND, OR, None. When the add button is clicked query attrName:value (AND|OR| ) will be added to the Box1 at the present cursor position. On clicking the search button this window will be closed and the network will be searched using the query and the resulting nodes and edges will be highlighted.

Other:

The current version of cytoscape doesn’t support authentication for proxy server. But some proxy servers require authentication. I am thinking of implementing this for cytoscape 3. And also ‘auto completion’ of queries can be nice feature to implement. If time permits, I will implement both of the above functionalities.

Timeline

Till May 23: Get familiar with the community, mentor and OSGi, Spring technologies.

Week 1: Discussions with mentors and finalizing the modules as search-api, search-impl, search-ui.

Week 2-3: Porting the ESP to Cytoscape 3.0, fixing the errors due to change of many functions of cytoscape 2.6.

Week 4: Break

Week 5: Separation of the code into two OSGi bundles search-api and search-impl.

Week 6: Design UI. A rough draft of the UI will be designed and changes will be made by discussions with mentor.

Week 7: Send email to users group on cytoscape-discuss and gather feedback on UI. Start UI implementation. Deliverable: A right panel (like MCode panel) that displays a very basic UI, such as ESP search box.

Week 8: Continue with UI implementation. Implement UI and test it. Further improvements and testing the bundles. Deliverable: Advanced Query Builder UI. Progress report of July 21st is expected to include a screenshot of the UI. On the teleconference we will discuss minor twicks to the UI.

Week 9: Finalize the integrated UI for ESP, Filters and QuickFind. Make sure it enables all functionalities that the old plugins offered. Deliverable: a complete UI, a list of use cases to test the UI and their expected results.

Week 10: Add Topology search in a new tab. Deliverable: topology search UI and implementation.

Week 11: Add additional features, such as query autocompletion, search history and authentication for proxy server. Test Lucene's InstantiateIndex on large networks.

Week 12: Final evaluation and submission.

Attachments