Hepatitis C virus (HCV) is one of the major etiologic agents of liver cancer. HCV is an RNA virus that, unlike hepatitis B virus, is unable to integrate into the host genome.
Method:
We use Metamap to find biological semantics in the literature. TF and TF-IDF are used to evaluate the importance of entities respectively. TF represents word frequency, and IDF in TF-IDF represents inverse document frequency, which can be used to filter high-frequency words that cannot represent a certain field. Here, the TF-IDF calculation is to construct the semantics of a carcinogenic viruses and eight carcinogenic viruses into text respectively for calculation.