biobertApi

Example Texts

Create Single Text Block from Text That is easily re-separated


source

create_com_with_idx

 create_com_with_idx (df)

This function takes a pandas DataFrame with a column titled “comments” that is filled with text strings, ie. from comments. It adds an index number to each text string and adds ‘::’ at the end for later parsing. Finally, it returns all text elements combined into a single text block

Details
df pandas DataFrame containg a column, titled “comments,” of text elements. In this case these are Youtube comments.

create_text_block example


source

make_5k_sections

 make_5k_sections (df)

This Function gives section numbers to rows in order to group them later so that each text block is less than 5k characters.


source

create_one_large_text_block

 create_one_large_text_block (df)

source

create_all_text_blocks

 create_all_text_blocks (df)

This function creates multiple text blocks all less than 5k char.

text_block = create_one_large_text_block(df=coms); text_block[:250]
NameError: name 'create_one_large_text_block' is not defined
# show_doc(create_text_block)

Get Start and End Doc Indecies

Here we’ll use regex to find the text indecies of each doc or comment. This will tell us which doc or comment is teh parent of each biomedical term returned from BERN2


source

get_comment_spans_textblock

 get_comment_spans_textblock (text_block:str)

This function returns a dataframe full of the start, end and span of each text comment/doc in the text_block

Type Details
text_block str single block of text in this structure: '07 textt ext text. ::'
dfi = get_comment_spans_textblock(text_block); dfi
text start end span
0 0:: autophagy maintains tumour growth through ... 0.0 80.0 (0, 80)
1 1:: x-rays were negative and physical assessme... 81.0 305.0 (81, 305)
2 2:: it is a skin disease causing much itchines... 306.0 450.0 (306, 450)
3 3:: maybe its a tumour. maybe take some tyleno... 451.0 541.0 (451, 541)

Send combined text to bern2

Query Function:

This function will send our text to bern2 using their API to get the text labeled

text_block
"0:: autophagy maintains tumour growth through circulating the great arginine. :: 1:: x-rays were negative and physical assessment determined soft tissue damage to the lateral aspect of her ankle. she was initially treated with ice, an ace wrap, crutches and mild pain medications ,tylenol with codeine, :: 2:: it is a skin disease causing much itchiness. scratching leads to redness, swelling, cracking, weeping clear fluid, crusting, and scaling. :: 3:: maybe its a tumour. maybe take some tylenol. don't worry i'm not a doctor. i'm dave ::"

source

query_plain

 query_plain (text:str, url='http://bern2.korea.ac.kr/plain')

This function sends your text_block to the bern2 API and returns a json of labled biomedical terms from text_block with thier indecies.

Type Default Details
text str single block of bioMedical text
url str http://bern2.korea.ac.kr/plain the api address
# show_doc(query_plain)

Example of query_plain Output

text_block
"0:: autophagy maintains tumour growth through circulating the great arginine. :: 1:: x-rays were negative and physical assessment determined soft tissue damage to the lateral aspect of her ankle. she was initially treated with ice, an ace wrap, crutches and mild pain medications ,tylenol with codeine, :: 2:: it is a skin disease causing much itchiness. scratching leads to redness, swelling, cracking, weeping clear fluid, crusting, and scaling. :: 3:: maybe its a tumour. maybe take some tylenol. don't worry i'm not a doctor. i'm dave ::"

Send it all to Bern2

# # Send it all to Bern2
# output = query_plain(text=text_block);

An example of the bern2 out put

output = {'annotations': [{'id': ['mesh:D009369'],
   'is_neural_normalized': False,
   'mention': 'tumour',
   'obj': 'disease',
   'prob': 0.9999957084655762,
   'span': {'begin': 23, 'end': 29}},
  {'id': ['mesh:D001120'],
   'is_neural_normalized': False,
   'mention': 'arginine',
   'obj': 'drug',
   'prob': 0.9939362406730652,
   'span': {'begin': 67, 'end': 75}},
  {'id': ['mesh:D000082'],
   'is_neural_normalized': False,
   'mention': 'tylenol',
   'obj': 'drug',
   'prob': 0.9972689747810364,
   'span': {'begin': 278, 'end': 285}},
  {'id': ['mesh:D003061'],
   'is_neural_normalized': False,
   'mention': 'codeine',
   'obj': 'drug',
   'prob': 0.947392463684082,
   'span': {'begin': 291, 'end': 298}},
  {'id': ['mesh:D012871'],
   'is_neural_normalized': False,
   'mention': 'skin disease',
   'obj': 'disease',
   'prob': 0.9998037815093994,
   'span': {'begin': 313, 'end': 325}},
  {'id': ['mesh:D011537'],
   'is_neural_normalized': False,
   'mention': 'itchiness',
   'obj': 'disease',
   'prob': 0.9898108243942261,
   'span': {'begin': 339, 'end': 348}},
  {'id': ['mesh:D000080822'],
   'is_neural_normalized': False,
   'mention': 'redness',
   'obj': 'disease',
   'prob': 0.9481215476989746,
   'span': {'begin': 370, 'end': 377}},
  {'id': ['mesh:D004487'],
   'is_neural_normalized': True,
   'mention': 'swelling',
   'obj': 'disease',
   'prob': 0.9774566292762756,
   'span': {'begin': 379, 'end': 387}},
  {'id': ['mesh:D012135'],
   'is_neural_normalized': True,
   'mention': 'cracking',
   'obj': 'disease',
   'prob': 0.8271865248680115,
   'span': {'begin': 389, 'end': 397}},
  {'id': ['mesh:D002862'],
   'is_neural_normalized': True,
   'mention': 'crusting',
   'obj': 'disease',
   'prob': 0.9943530559539795,
   'span': {'begin': 420, 'end': 428}},
  {'id': ['mesh:D012871'],
   'is_neural_normalized': True,
   'mention': 'scaling',
   'obj': 'disease',
   'prob': 0.9980024695396423,
   'span': {'begin': 434, 'end': 441}},
  {'id': ['mesh:D009369'],
   'is_neural_normalized': False,
   'mention': 'tumour',
   'obj': 'disease',
   'prob': 0.9999805688858032,
   'span': {'begin': 460, 'end': 466}},
  {'id': ['mesh:D000082'],
   'is_neural_normalized': False,
   'mention': 'tylenol',
   'obj': 'drug',
   'prob': 0.9799597263336182,
   'span': {'begin': 484, 'end': 491}}],
 'text': "0: autophagy maintains tumour growth through circulating the great arginine.:: 1: x-rays were negative and physical assessment determined soft tissue damage to the lateral aspect of her ankle. she was initially treated with ice, an ace wrap, crutches and mild pain medications ,tylenol with codeine,:: 2: it is a skin disease causing much itchiness. scratching leads to redness, swelling, cracking, weeping clear fluid, crusting, and scaling.:: 3: maybe its a tumour. maybe take some tylenol. don't worry i'm not a doctor. i'm dave::",
 'timestamp': 'Mon Nov 14 18:00:04 +0000 2022'}

Example of overall df

pd.DataFrame(output)[-3:-1]
annotations text timestamp
10 {'id': ['mesh:D012871'], 'is_neural_normalized... 0: autophagy maintains tumour growth through c... Mon Nov 14 18:00:04 +0000 2022
11 {'id': ['mesh:D009369'], 'is_neural_normalized... 0: autophagy maintains tumour growth through c... Mon Nov 14 18:00:04 +0000 2022

Example of df from just the annotations column

pd.DataFrame(output['annotations'])[-3:-1]
id is_neural_normalized mention obj prob span
10 [mesh:D012871] True scaling disease 0.998002 {'begin': 434, 'end': 441}
11 [mesh:D009369] False tumour disease 0.999981 {'begin': 460, 'end': 466}

Add BERN2 Lables to Docs

We sent all separate text documents as one big text document to bern2. Now we’ll re-separate the labeled text to show which biomedical words were in which documents.

dfa = pd.DataFrame(output['annotations']) #crete dfa for dataframe of Annotaation

# create str_end col as type str
dfa.span = dfa.span.astype(str)
dfa ['str_end'] = dfa.span.str.replace(r".*'end': (\d+)}",r"\1",regex=True)

dfa.str_end = dfa.str_end.astype(int)  # make str end as type str

#add dfi_idx col
dfi.reset_index(inplace=True)
dfi.rename(columns={'index':'dfi_idx'},inplace=True)

for o,m in zip(dfi.index,dfi.span):  #add dfi_idx col to dfa 
    x,y = m #open span tuple
    #write conditions for specific df rows
    conds = (dfa.str_end > x) & (dfa.str_end < y)
    dfa.loc[conds,'dfi_idx'] = o #save the index of the dfi span that fits to dfa

# dfa.merge(dfi, left_on='dfi_idx',right_index=True)
df = dfa.merge(dfi, left_on='dfi_idx',right_on='dfi_idx');  df.head(2)
id is_neural_normalized mention obj prob span_x str_end dfi_idx text start end span_y
0 [mesh:D009369] False tumour disease 0.999996 {'begin': 23, 'end': 29} 29 0.0 0:: autophagy maintains tumour growth through ... 0.0 80.0 (0, 80)
1 [mesh:D001120] False arginine drug 0.993936 {'begin': 67, 'end': 75} 75 0.0 0:: autophagy maintains tumour growth through ... 0.0 80.0 (0, 80)

Output df

df = df[['dfi_idx','mention','obj','text']]; df[-4:-1]
dfi_idx mention obj text
9 2.0 crusting disease 2:: it is a skin disease causing much itchines...
10 2.0 scaling disease 2:: it is a skin disease causing much itchines...
11 3.0 tumour disease 3:: maybe its a tumour. maybe take some tyleno...

Show # of Comments that Contain Word

dfwords = df.groupby(['mention']).dfi_idx.count().reset_index().sort_values('dfi_idx',ascending=False)
# !pip install seaborn
# import seaborn as sns
# import matplotlib.pyplot as plt

# # Set the width and height of the figure
# plt.figure(figsize=(8,6))

# ax = sns.barplot(x=dfwords.dfi_idx, y=dfwords.mention)

# #title
# ax.set_title(f'Biomedical Terms in Comments')

# # Add label for  axis
# ax.set(xlabel='Number of commenters mentioning the term')
# plt.show()