Wednesday, May 6, 2015





Wikipedia Scholar




Overview:  In this blog, I describe what I call Wikipedia Scholar.  If it is ever implemented, Wikipedia Scholar would be a way for academics to link their works to Wikipedia articles.  The methods I explain will simultaneously make academic research easier to conduct, increase public access to peer-reviewed journal information, and provide a means for everyday people to ensure they have reliable information.


Wikipedia is already a great way to peruse information on various topics, so I think it makes sense to augment the current Wikipedia format with a "scholar" option that would allow researchers to connect a piece of work to the topics it related to.  The underlying idea is to add an additional layer to Wikipedia reserved for academic papers.  There really isn't anything preventing this layer from operating as an independent website, but I think it would catch on quicker if it was officially recognized by Wikipedia.

The way this layer would work is that researchers could link their work to associated Wikipedia articles.  These articles can be connected to others through the use of keywords and citations, as is standard, but also through their links to hyper-articles.  A hyper-article is a standard Wikipedia page that concerns a more general topic for which the linked article addresses a specific sub-topic.  Similarly, a hypo-article is an article that falls under a given hyper-article.  

As an example, assume someone is interested in natural language processing (NLP), i.e. how we get computers to "understand" what people say.  Some example hyper-articles for NLP could be artificial intelligence (AI) and linguistics in general.  Some example hypo-articles under NLP could be automatic machine translation (translating one spoken language to another), automatic summarization of text documents, and parsing (splitting sentences up into nouns, verbs, etc.).



While searching, say the person becomes interested in doing research on parsing text.  To this end, they could check the Wikipedia articles linked with this hyper-article by clicking on that node in the network.  The network can be presented in a Collapsible Force Layout (from the D3 Website) so that users can expand or collapse branches to the desired degree.  The size of the displayed network can be controlled by collapsing the rest of the network apart from one expanded dot.  Additionally, articles can be clustered together into article communities to help organize them as well as keep the total number of expanded nodes down to a minimum.  As an example to illustrate why forming such communities is necessary, there are over 40,000 articles related to parsing and NLP accessible from Google Scholar; it simply is not practical to have them all displayed at once.

What I can personally do:  This clustering process is a task my academic background would be well suited for.  I am familiar with natural language processing techniques, and a chapter of my PhD thesis was on community detection in networks.  



Expanding the network all the way down to the articles would allow researchers (or whoever really) to see what are open research topics (ideally represented as Wikipedia articles) and the papers connected to them.  The representation of these articles can be set so that the size of the dot reflects the PageRank value of the article with respect to the local citation network (as defined below) the article is a component of.

Local Citation Network:  The local citation network for a Wikipedia Scholar article should focus on articles linked to the same hyper-article.  When expanding the network around a given article, the network representation should include:  articles citing the article being expanded, articles cited by the article being expanded, and the counts for the inlinks and outlinks for all of those articles.

The dots for articles can be augmented with Wordle  to help convey the content of each dot.  Below is an example Wordle image for the introductory paragraph of the Wikipedia article on NLP; ideally this should all be within the NLP dot, but it is just for illustration purposes.  For articles, this should pull the text from the abstract and conclusions sections as well as the keywords.  To make this easier to visually parse, the words can be color coded in the Wordle with different fonts to help differentiate what is what; e.g., keywords in black, abstract in cold colors, and conclusions in warm colors.





Altogether, I think this would be a substantially better research tool than the current way academic search engines (e.g., Google Scholar) operate; this allows for a quick overview of the surrounding field of interest and allows one to quickly extract important features of an article (e.g., PageRank value, and keywords) in a visual manner.  Additionally, this would allow everyday folks to see if the information presented in Wikipedia has any academic backing; it is a method to help people ensure they are getting reliable information.




Implementation

-  Scholar users upload pdf documents of their papers.  Additionally, they provide the following information:  keywords, citations, hypernym links.

-  Each submission is saved, and further edits can also be submitted.  Each iteration of the paper is stored to make its development transparent.

-  A citation network is formed from the set of articles

-  Can also have a Y!A-like feature so that Scholar users can ask/answer questions in a game like fashion that allows them to get feedback on their work.  This Y!A-like feature can also be extended to non-Scholar Wikipedia users so that everyday folks can get better feedback on questions they have.  




Peer Review / Q&A Feature


Overview:  We want a way for people to get information that is factually reliable.  To this end, we can create a Q&A network that allows Wikipedia users to ask questions and get feedback containing links to additional information.  The idea is so that researchers can get feedback from other researchers with pointers to additional information / potential papers of interest.  This can also serve as the peer review of the article, with the individual reviewers voting on the academic merit of the article.  This review can include aspects such as:  novelty, reliability of information (to take care of crackpots who post to the site), sections needing clarification,








No comments:

Post a Comment