Skip to content

Commit 8c333a5

Browse files
Update 06-lsa.md
1 parent 47cdd42 commit 8c333a5

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

_episodes/06-lsa.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,8 @@ We don't know *why* they are getting arranged this way, since we don't know what
279279
Let's write a helper to get the strongest words for each topic. This will show the terms with the *highest* and *lowest* association with a topic. In LSA, each topic is a spectra of subject matter, from the kinds of terms on the low end to the kinds of terms on the high end. So, inspecting the *contrast* between these high and low terms (and checking that against our domain knowledge) can help us interpret what our model is identifying.
280280

281281
```python
282+
import pandas as pd
283+
282284
def show_topics(topic, n):
283285
# Get the feature names (terms) from the vectorizer
284286
terms = vectorizer.get_feature_names_out()
@@ -287,7 +289,7 @@ def show_topics(topic, n):
287289
weights = svdmodel.components_[topic]
288290

289291
# Create a DataFrame with terms and their corresponding weights
290-
df = pandas.DataFrame({"Term": terms, "Weight": weights})
292+
df = pd.DataFrame({"Term": terms, "Weight": weights})
291293

292294
# Sort the DataFrame by weights in descending order to get top n terms
293295
tops = df.sort_values(by=["Weight"], ascending=False)[0:n]
@@ -296,7 +298,7 @@ def show_topics(topic, n):
296298
bottoms = df.sort_values(by=["Weight"], ascending=False)[-n:]
297299

298300
# Concatenate top and bottom terms into a single DataFrame and return
299-
return pandas.concat([tops, bottoms])
301+
return pd.concat([tops, bottoms])
300302

301303
# Get the top 5 and bottom 5 terms for each specified topic
302304
topic_words_x = show_topics(1, 5) # Topic 1
@@ -361,7 +363,7 @@ print(topic_words_y)
361363
Now that we have names for our first two topics, let's redo the plot with better axis labels.
362364

363365
```python
364-
lsa_plot(data, svdmodel, groupby="Author", colors=colormap, xlabel="Victorian vs. Elizabethan", ylabel="English vs. French")
366+
lsa_plot(data, svdmodel, groupby="author", colors=colormap, xlabel="Victorian vs. Elizabethan", ylabel="English vs. French")
365367
```
366368

367369
![Plot results of our LSA model, revised with new axis labels](../images/05-lsa-plot-labeled.png)

0 commit comments

Comments
 (0)