Google_Summer_of_Code_2009/Chetan

Student Name: Chetan Bansal
Mentor Name: Alexander Pico
Title: Ontologies for Wiki Pathways
Abstract: In any application, be it a website or a video game, UI & Accessibility are the most important things to keep in any mind. According to me any information should be properly organized and indexed, so that people can make the best use out of it.

Overview

Ontologies are used to define relationships between various concepts, users can easily find various pathways which are related to each other. Also users will be able to search the pathways using the ontology terms, so it provides another means of finding the pathways you are looking for. Since there are a lot of biological ontology providers, they can possibly link to our pathways (using the API ) on their sites, thus providing more features to their users and giving us potential contributers.


Coordination

Scheduled meetings

  • May 26th
    • Project overview
    • Scheduling
  • June 9th
  • June 23rd
  • July 7th

Weekly reports

  • Due on Mondays (by the end of the day)
  • Should include progress on ideas and code over the past week and projected work for the coming week.



19/05/09 - 26/05/09
Setup

NetBeans IDE

I have installed the NetBeans IDE so as to speed up the coding process and also to get into the habit of using a dedicated IDE rather than Notepad / Dreamweaver.In order to get familiar with the features offered in NetBeans like code completion, inbuilt database connectivity features and a host of other features, I wrote a simple mailing list application.

Local Test Server

Since, so far I have been using the Windows platform for all my needs and my interaction with Linux has been fairly limited. I tried to setup a local server on that following the instructions. The attempt had been successful to quite a extent. It will do for the php coding I need to do. But as we discussed, it will be better to have a completely working test server rather than a partial one. So with respect to that, I have installed ‘Kubuntu’ on my laptop and intend to complete the installation in the coming days. Though I will like to mention that, the coding I need to do ( Php / Javascript / SQL ) is totally platform independent so its not a critical issue at this point, in my opinion.

Bioportal Documentation

As I mentioned before, the greater part of the documentation ,kindly given to us by Mr. Shah had been regarding an Annotation tool which in my opinion is quite useful for automated annotation of BioMedical journals, but not so for us. Since most of our pathways lack elaborate descriptions, so I guess we might not be actually using it as of now.

However the ‘Rest Services’ offered by Bioportal to browse / search various Ontologies will form the backbone of our plugin. These services have been fairly documented and I guess we can use them without any problem.

Prototype

Major part of the past week have been used in trying to setup the local server and getting familiar with Mediawiki . However I have been through the documentation of Mediawiki for Developers and have studied the important stuff like various Global variables used throughout and different concepts like ‘Hooks’ etc.

In order to prepare for our Conference call with Mr. Shah, I have spent the last couple of days in studying the documentation and implementing them in a plugin. In that context I built a ‘REST Client’ to fetch / traverse ontologies using their API. It comprises of a ‘PHP Proxy Script’ at the backend which parses the XML output of the BIOPortal & converts into JSON. The frontend has been built using YUI framework. The plugin has AJAX functionality and I believe with some minor modifications / additions can be used. The Protoype can be viewed here -  http://dingo.ucsf.edu/~apico/demo.html.



27/05/09 - 2/06/09

BioPortal REST Client
I built upon the initial prototype, I coded last week and added an AJAX based Autocomplete box for searching the Ontologies. Also, I recoded some of the functions to use inbuilt functions in PHP.

Wikipathways Plugin - Prototype
The next step was to integrate the 'REST Client' with the MediaWiki engine. That I had begun some 4 days back. The first step was to go through the documentation of Mediawiki. However, studying other extensions available proved to be more helpful than the documentation! I have come up with a basic extension which has the AJAX based autocomplete - Search box . It allows you to tag the pathways. However the tags are not permanent and are store using Cookies. I believe storing the tags in the GPML still requires a lot of work & discussion with Alex !

Also,since we will be using open source libraries like YUI for the plugin,so I briefly studied various popular 'Source Code licenses' like GPl, MIT, BSD,etc. inorder to conform with their respective terms and conditions.


2/06/09 - 8/06/09

This week, I Concentrated on Wikipathways! I studied various important classes which I will need to use.Thomas' documentation came in handy. I also checked out the WPI RPC n the SOAP API and used their functions to fetch / update / edit pathways to gain a better understanding. I had a brief discussion with Thomas about various important issues on the mailing list. In the first version of the extensions I used cookies to store the data, so that I can test the AJAX / Php part of the plugin. I am changing it to use the gpml instead of the cookies. I will complete it by day after tomorrow.

Also, I also spent some time debugging the local server. At one point I even tried to make it use a standlone copy of Pathvisio instead of the one with Wikipathways . Though it gave me a clearer idea how pathvisio and wikipathways work together. The errors that have been popping up are due to Pathvisio not being able to convert the gpml files to svg and other formats. So what I did was commented that part of the pathways.php which converted gpml file into svg... the tradeoff being no actual pathways being displayed.

However, other functions like drawing,editing the pathway in the applet/ updating pathway / editing description / curation tags / categories work flawlessly. I really dont need to view the svg files or convert gpml to other format. I wish I had thought of it earlier :).

In the coming week, I will finish off the conversion mentioned above and work on the frontend part of it for deleting and editing the tags.

8/06/09 - 15/06/09

This past week I have spent most of the time on the following two things :

1. Local Server - With some debugging, the local server is fully functional now.Apparently, there was a problem with how java treats single quotes in the arguments passed. 2. The prototype - I believe, it has a fair amount of functionality now, we can add / delete / view the tags using a proper interface. Also, the tags are directly stored / fetched from the gpml now. The interface is such that, you can add / delete as many tags as you want, and save them all at once. We also finalized using BioPax to store the tags. Perhaps we can also finalize the xml tags and structure to be used. I have already initiated a discussion for that on the mailing list....

During the next week, I will work to improve the interface, as per your feedback and also add caching feature for the ontology tags (metacache) .

15/06/09 - 22/06/09

I spent, first half of the week on optimizing the current code and the cache.. First, I used the current caching method (metadatacache) used for name and publication references... But, like Thomas said, it will be more convenient and much faster to use a seperate table. The new version of the prototype ( 3.0 ) has the same UI, but uses JSON for all the data transfer through AJAX instead of CSV. Also the terms are now fetched from the cache .. instead of the gpml...

I have started with the 'Index page' for browsing the pathways according to the tags. We can checkout the pathways which share common tags and also search for the tags (Ajax?). It will be completed within 2-3 days. I am attaching the new code. Please replace the existing folder "wpi->extension->otag" and also execute the query in cache.sql...

In the coming week, depending upon our discussion tomorrow ...I will atleast try tofinish off with this 'Browse Page' and integrate the treeview with the UI .


Project

Timeline / Implementation

Phase 1: Homework ( 20th April – 20th May )

  1. Setup a localserver for WikiPathways.
  2. Get familiar with SVN .
  3. Search the internet for various ontology providers such as bio portal. Study of their API and the various data formats used. Check the prospective
  4. libraries in Php which I will be using for querying those services.
  5. Study the coding practices followed in the code for Wikipathways and the various notations used.

Phase 2: Backend Coding ( 23rd May to 25th June)

  1. Using the API of the ontology providers to query their database and fetch the tags.
  2. Elimination of repeated terms between multiple providers.
  3. Auto recommendation of tags by using the component genes and proteins in a given pathway.
  4. Caching of the data in order to save bandwidth up to a particular time. This feature is also useful incase the provider is facing downtime.
  5. Option for the Users to suggest custom terms, which if approved are added to the database.6. Add a search/indexing feature using which all the pathways which are tagged with a given term can be listed/exported.

Phase 3: Interface for Annotation ( 25th June to 10th July)

  1. Admin will have the option to annotate multiple pathways at the same time or one by one.
  2. Two tiered system in which users can select ontologies and then terms from those ontologies, thus enter terms from different ontologies.
  3. AJAX based interface which will be coded using existing AJAX functions in the MediaWiki engine and extending them.
  4. Implementation of 'submission throttling' and caching of user's input to enhance user experience and optimal utilization of server resources.

Phase 4: Admin Module ( 10th July – 5th August)

  1. Code an admin module which is integrated with the Admin panel of WikiPathways which enables the administrators to:
    1. Enable/ Disable an ontology term provider.
    2. Implement a feature using which new providers can be added from this panel itself by listing their API links and the data format they use.
    3. Enable/disable caching of the terms. Set a time interval after which the terms will be updated.
    4. Implement an API using which the providers can query WikiPathways for ontology terms and get the output in various formats like JSON / XML / CSV.

Phase 5: Testing (August 6th – August 16th)

Attachments