#!pip install transformers
Zero Shot Classification
This shows how to use zero shot classification on data from a linkedin dataset
Here we’ll load everything we need into the notebook
!pip install plotly
Requirement already satisfied: plotly in /opt/conda/lib/python3.7/site-packages (5.11.0)
Requirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from plotly) (8.1.0)
from transformers import pipeline
from fastai.tabular.all import *
import plotly.express as px
Now we’ll load a classifier via the Hugging face transformers pipeline
= pipeline("zero-shot-classification") classifier
No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)
Here we’ll load our top bio terms and all comments
= Path('./text_sample')
path = pd.read_csv(path/'top_bio_terms.csv')
df_terms = pd.read_csv(path/'found_my_fitness_UCWF8SqJVNlx-ctXbLswcTcA_youtube_comments_only_121rows.csv') dftext
We’ll use this info to create one textblock from all the comments that we’ll form into a df to use the “analyze_one” function
def create_one_large_text_block(df):
= ' '.join(df.comment.tolist())
text_block return text_block
#create the textblock
= create_one_large_text_block(dftext)
textblock
#create df with an input column
= pd.DataFrame({'input':[textblock]})
df input = df.input.astype(str)
df.input = df.input.str.replace('\\n',' ',regex=False) #replace any \n text df.
Here We’ll create a list from the bio_terms we pulled in the “Found My Fitness Example”
= df_terms.bio_term.tolist() candidate_labels
And now we’ll add a few more to see how the model analyzes the whole text
'rhonda','Mickey Mouse','addiction','Howard Taft']); candidate_labels candidate_labels.extend([
['alcohol (drug)',
'fgf21 (gene)',
'alcoholism (disease)',
'humans (species)',
'naltrexone (drug)',
'rhonda',
'Mickey Mouse',
'addiction',
'Howard Taft']
This is the fucntion that will run
def analyze_one(df, candidate_labels, index ):
=index
i= df.input[i]
sequence = classifier(sequence, candidate_labels)
answer = pd.DataFrame(answer)
dfo 'scores',inplace=True)
dfo.sort_values(
= px.bar(dfo, x="scores", y="labels", orientation='h')
fig = dfo.sequence[0]
txtblck print(str(txtblck[:400])+'...')
fig.show()
Print Graphs from Labels
=0) analyze_one(df,candidate_labels,index
If exercise and/or the interaction with alcohol behaviors has really piqued your interest, make sure to check out my recent "Sober October" post series found on my Instagram: https://www.instagram.com/foundmyfitness Support the show as a premium member: https://www.foundmyfitness.com/crowdsponsor Thanks for watching! Hi Rhonda, could this mean that "one time" phsychedelic treatments with e.g.: ay...