Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project

Knowledge Sources
online news sources (ex: Google News)
expert-curated discussion (ex: ProMED-mail)
vaildated official reports (ex: WHO)

Internet Search Criteria
    Disease names(scientific and common)
    Keywords (?)

Knowledge Extraction
text-mining algo. (Global Infectious disease monitoring through automated classification and visualization of internet media reports)
Characterization Stages:
a) Identifying Disease and location
b) Determining Relevance (whether a given report refers to any current outbreak
c) Grouping similar reports and removing exact duplicates

HealthMap use Bayesian machine learning algorithm to automatically tag and separate breaking news.

Using a similarity score threshold, the system groups related articles into clusters that provide the collective information on a given outbreak.

Knowledge Intergration and dissemination
A false alarms can be reduced by
a) the reliability of the data source ( WHO>local media)
b) The number of unique data sources (discussion sites & media)
