= create_one_large_text_block(df=coms); text_block[:250] text_block
NameError: name 'create_one_large_text_block' is not defined
create_com_with_idx (df)
This function takes a pandas DataFrame with a column titled “comments” that is filled with text strings, ie. from comments. It adds an index number to each text string and adds ‘::’ at the end for later parsing. Finally, it returns all text elements combined into a single text block
Details | |
---|---|
df | pandas DataFrame containg a column, titled “comments,” of text elements. In this case these are Youtube comments. |
make_5k_sections (df)
This Function gives section numbers to rows in order to group them later so that each text block is less than 5k characters.
create_one_large_text_block (df)
create_all_text_blocks (df)
This function creates multiple text blocks all less than 5k char.
NameError: name 'create_one_large_text_block' is not defined
Here we’ll use regex to find the text indecies of each doc or comment. This will tell us which doc or comment is teh parent of each biomedical term returned from BERN2
get_comment_spans_textblock (text_block:str)
This function returns a dataframe full of the start, end and span of each text comment/doc in the text_block
Type | Details | |
---|---|---|
text_block | str | single block of text in this structure: '07 textt ext text. ::' |
text | start | end | span | |
---|---|---|---|---|
0 | 0:: autophagy maintains tumour growth through ... | 0.0 | 80.0 | (0, 80) |
1 | 1:: x-rays were negative and physical assessme... | 81.0 | 305.0 | (81, 305) |
2 | 2:: it is a skin disease causing much itchines... | 306.0 | 450.0 | (306, 450) |
3 | 3:: maybe its a tumour. maybe take some tyleno... | 451.0 | 541.0 | (451, 541) |
This function will send our text to bern2 using their API to get the text labeled
"0:: autophagy maintains tumour growth through circulating the great arginine. :: 1:: x-rays were negative and physical assessment determined soft tissue damage to the lateral aspect of her ankle. she was initially treated with ice, an ace wrap, crutches and mild pain medications ,tylenol with codeine, :: 2:: it is a skin disease causing much itchiness. scratching leads to redness, swelling, cracking, weeping clear fluid, crusting, and scaling. :: 3:: maybe its a tumour. maybe take some tylenol. don't worry i'm not a doctor. i'm dave ::"
query_plain (text:str, url='http://bern2.korea.ac.kr/plain')
This function sends your text_block
to the bern2 API and returns a json of labled biomedical terms from text_block
with thier indecies.
Type | Default | Details | |
---|---|---|---|
text | str | single block of bioMedical text | |
url | str | http://bern2.korea.ac.kr/plain | the api address |
"0:: autophagy maintains tumour growth through circulating the great arginine. :: 1:: x-rays were negative and physical assessment determined soft tissue damage to the lateral aspect of her ankle. she was initially treated with ice, an ace wrap, crutches and mild pain medications ,tylenol with codeine, :: 2:: it is a skin disease causing much itchiness. scratching leads to redness, swelling, cracking, weeping clear fluid, crusting, and scaling. :: 3:: maybe its a tumour. maybe take some tylenol. don't worry i'm not a doctor. i'm dave ::"
output = {'annotations': [{'id': ['mesh:D009369'],
'is_neural_normalized': False,
'mention': 'tumour',
'obj': 'disease',
'prob': 0.9999957084655762,
'span': {'begin': 23, 'end': 29}},
{'id': ['mesh:D001120'],
'is_neural_normalized': False,
'mention': 'arginine',
'obj': 'drug',
'prob': 0.9939362406730652,
'span': {'begin': 67, 'end': 75}},
{'id': ['mesh:D000082'],
'is_neural_normalized': False,
'mention': 'tylenol',
'obj': 'drug',
'prob': 0.9972689747810364,
'span': {'begin': 278, 'end': 285}},
{'id': ['mesh:D003061'],
'is_neural_normalized': False,
'mention': 'codeine',
'obj': 'drug',
'prob': 0.947392463684082,
'span': {'begin': 291, 'end': 298}},
{'id': ['mesh:D012871'],
'is_neural_normalized': False,
'mention': 'skin disease',
'obj': 'disease',
'prob': 0.9998037815093994,
'span': {'begin': 313, 'end': 325}},
{'id': ['mesh:D011537'],
'is_neural_normalized': False,
'mention': 'itchiness',
'obj': 'disease',
'prob': 0.9898108243942261,
'span': {'begin': 339, 'end': 348}},
{'id': ['mesh:D000080822'],
'is_neural_normalized': False,
'mention': 'redness',
'obj': 'disease',
'prob': 0.9481215476989746,
'span': {'begin': 370, 'end': 377}},
{'id': ['mesh:D004487'],
'is_neural_normalized': True,
'mention': 'swelling',
'obj': 'disease',
'prob': 0.9774566292762756,
'span': {'begin': 379, 'end': 387}},
{'id': ['mesh:D012135'],
'is_neural_normalized': True,
'mention': 'cracking',
'obj': 'disease',
'prob': 0.8271865248680115,
'span': {'begin': 389, 'end': 397}},
{'id': ['mesh:D002862'],
'is_neural_normalized': True,
'mention': 'crusting',
'obj': 'disease',
'prob': 0.9943530559539795,
'span': {'begin': 420, 'end': 428}},
{'id': ['mesh:D012871'],
'is_neural_normalized': True,
'mention': 'scaling',
'obj': 'disease',
'prob': 0.9980024695396423,
'span': {'begin': 434, 'end': 441}},
{'id': ['mesh:D009369'],
'is_neural_normalized': False,
'mention': 'tumour',
'obj': 'disease',
'prob': 0.9999805688858032,
'span': {'begin': 460, 'end': 466}},
{'id': ['mesh:D000082'],
'is_neural_normalized': False,
'mention': 'tylenol',
'obj': 'drug',
'prob': 0.9799597263336182,
'span': {'begin': 484, 'end': 491}}],
'text': "0: autophagy maintains tumour growth through circulating the great arginine.:: 1: x-rays were negative and physical assessment determined soft tissue damage to the lateral aspect of her ankle. she was initially treated with ice, an ace wrap, crutches and mild pain medications ,tylenol with codeine,:: 2: it is a skin disease causing much itchiness. scratching leads to redness, swelling, cracking, weeping clear fluid, crusting, and scaling.:: 3: maybe its a tumour. maybe take some tylenol. don't worry i'm not a doctor. i'm dave::",
'timestamp': 'Mon Nov 14 18:00:04 +0000 2022'}
Example of overall df
annotations | text | timestamp | |
---|---|---|---|
10 | {'id': ['mesh:D012871'], 'is_neural_normalized... | 0: autophagy maintains tumour growth through c... | Mon Nov 14 18:00:04 +0000 2022 |
11 | {'id': ['mesh:D009369'], 'is_neural_normalized... | 0: autophagy maintains tumour growth through c... | Mon Nov 14 18:00:04 +0000 2022 |
Example of df from just the annotations
column
We sent all separate text documents as one big text document to bern2. Now we’ll re-separate the labeled text to show which biomedical words were in which documents.
dfa = pd.DataFrame(output['annotations']) #crete dfa for dataframe of Annotaation
# create str_end col as type str
dfa.span = dfa.span.astype(str)
dfa ['str_end'] = dfa.span.str.replace(r".*'end': (\d+)}",r"\1",regex=True)
dfa.str_end = dfa.str_end.astype(int) # make str end as type str
#add dfi_idx col
dfi.reset_index(inplace=True)
dfi.rename(columns={'index':'dfi_idx'},inplace=True)
for o,m in zip(dfi.index,dfi.span): #add dfi_idx col to dfa
x,y = m #open span tuple
#write conditions for specific df rows
conds = (dfa.str_end > x) & (dfa.str_end < y)
dfa.loc[conds,'dfi_idx'] = o #save the index of the dfi span that fits to dfa
# dfa.merge(dfi, left_on='dfi_idx',right_index=True)
df = dfa.merge(dfi, left_on='dfi_idx',right_on='dfi_idx'); df.head(2)
id | is_neural_normalized | mention | obj | prob | span_x | str_end | dfi_idx | text | start | end | span_y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [mesh:D009369] | False | tumour | disease | 0.999996 | {'begin': 23, 'end': 29} | 29 | 0.0 | 0:: autophagy maintains tumour growth through ... | 0.0 | 80.0 | (0, 80) |
1 | [mesh:D001120] | False | arginine | drug | 0.993936 | {'begin': 67, 'end': 75} | 75 | 0.0 | 0:: autophagy maintains tumour growth through ... | 0.0 | 80.0 | (0, 80) |
dfi_idx | mention | obj | text | |
---|---|---|---|---|
9 | 2.0 | crusting | disease | 2:: it is a skin disease causing much itchines... |
10 | 2.0 | scaling | disease | 2:: it is a skin disease causing much itchines... |
11 | 3.0 | tumour | disease | 3:: maybe its a tumour. maybe take some tyleno... |
# import seaborn as sns
# import matplotlib.pyplot as plt
# # Set the width and height of the figure
# plt.figure(figsize=(8,6))
# ax = sns.barplot(x=dfwords.dfi_idx, y=dfwords.mention)
# #title
# ax.set_title(f'Biomedical Terms in Comments')
# # Add label for axis
# ax.set(xlabel='Number of commenters mentioning the term')
# plt.show()