2017 Workshop at the Academy of Management Annual Meeting

First time setup in R

In order to follow the code developed for the workshop (shown in full below under Workshop example), two pieces of software need to be installed beforehand. The first is R, a free software environment for statistical computing and graphics. The second is RStudio, which is a graphical user interface that goes ‘over’ R, making it more user friendly. It is adamant that R is installed first, and RStudio second.

Before running the code shown below, install R on your system by going to the following page: https://cran.r-project.org/ Here, OS-specific versions of R can be found. For example, by clicking here, you can download the executable for Windows (version 3.4.1 is the latest version at the time of writing). Installation using the default settings should do the trick.

Then, after the installation of R is complete, navigate to the following page: https://www.rstudio.com/ You can download the free version of RStudio on this page, with this link pointing directly to the executable for Windows. Again, the default settings should do the trick.

Then, after these steps are completed, it is advisable to run the following two lines of code in RStudio before coming to the workshop.

if (!require("tm")) install.packages("tm")
if (!require("topicmodels")) install.packages("topicmodels")

These will install the two core packages that we will use in the workshop, and their installation may take some time on the standard conference internet connection. After this is done, you’re all set to participate in the workshop! It is also possible to run the code shown below at home beforehand, but note that the actual topic model takes a LONG time to finish on most PCs.

You can install packages by entering them to your script (you can start a new script on Windows via “Shift+Ctrl+N” or by navigating to “File” –> “New File” –> “R Script”. You can run code by selecting the code and pressing “Ctrl + Enter” or the “Run” button at the top of the script window. Packages can also be installed by navigating to “Packages” (which should be on the right half of your screen), or by selecting “Tools” at the top of your screen and selecting “Install packages” from there.

Workshop example

Abstracts of five journals ——————- Code tested and written for R version 3.4, tm package version 0.7-1, topicmodels package version 0.2-6. Code prepared on May 10, 2017 by Richard Haans (haans@rsm.nl). Data obtained from the Web of Science.

Package installation

### Open the R code of this workshop (needs to be copy-pasted into an R script after loading):
url.show("https://raw.githubusercontent.com/RFJHaans/topicmodeling/master/Data/2017/2017%20AoM%20LDA%20Workshop%20-%20abstracts.R")


# The "tm" package enables the text mining infrastructure that we will use for LDA.
    if (!require("tm")) install.packages("tm")

# The "topicmodels" package enables LDA analysis.
    if (!require("topicmodels")) install.packages("topicmodels")

Get the data, turn into a corpus, and clean it up

### Load the output of the 200-topic model (we cannot run it during the workshop due to time constraints). 
load(url("https://github.com/RFJHaans/topicmodeling/blob/master/Data/2017/LDA200.RData?raw=true"))

### Load data from a URL
    data = read.csv(url("https://raw.githubusercontent.com/RFJHaans/topicmodeling/master/Data/2017/ASQ_AMJ_AMR_OS_SMJ.csv"))

### Create a corpus. 
    corpus = VCorpus((VectorSource(data[, "AB"])))

### Basic cleaning (step-wise)
# We write everything to a new corpus called "corpusclean" so that we do not lose the original data.
# 1) Remove numbers
    corpusclean = tm_map(corpus, removeNumbers)
# 2) Remove punctuation
    corpusclean = tm_map(corpusclean, removePunctuation)
# 3) Transform all upper-case letters to lower-case.
    corpusclean = tm_map(corpusclean,  content_transformer(tolower))
# 4) Remove stopwords which do not convey any meaning.
    corpusclean = tm_map(corpusclean, removeWords, stopwords("english"))
# this stopword file is at C:\Users\[username]\Documents\R\win-library\[rversion]\tm\stopwords 

# i	me	my	myself	we	our	ours	ourselves	you	your	yours	yourself	yourselves	he	him	his	himself	
# she	her	hers	herself	it	its	itself	they	them	their	theirs	themselves	what	which	who	whom	this
# that	these	those	am	is	are	was	were	be	been	being	have	has	had	having	do	does	did	doing	would	should
# could	ought	i'm	you're	he's	she's	it's	we're	they're	i've	you've	we've	they've	i'd	you'd	he'd	she'd	we'd
# they'd	i'll	you'll	he'll	she'll	we'll	they'll	isn't	aren't	wasn't	weren't	hasn't	haven't	hadn't	doesn't	
# don't	didn't	won't	wouldn't	shan't	shouldn't	can't	cannot	couldn't	mustn't	let's	that's	who's	what's	here's
# there's	when's	where's	why's	how's	a	an	the	and	but	if	or	because	as	until	while	of	at	by	for	with	about	
# against	between	into	through	during	before	after	above	below	to	from	up	down	in	out	on	off	over	under	again
# further	then	once	here	there	when	where	why	how	all	any	both	each	few	more	most	other	some	such	no	nor	
# not	only	own	same	so	than	too	very

# 5) And strip whitespace. 
    corpusclean = tm_map(corpusclean , stripWhitespace)

# See the help of getTransformations for more possibilities, such as stemming. 

# To speed up the computation process for this tutorial, I have selected some choice words that were very common:
# We update the corpusclean corpus by removing these words. 
    corpusclean = tm_map(corpusclean, removeWords, c("also","based","can","data","effect",
                                                  "effects","elsevier","evidence","examine",
                                                  "find","findings","high","low","higher","lower",
                                                  "however","impact","implications","important",
                                                  "less","literature","may","model","one","paper",
                                                  "provide","research","all rights reserved",
                                                  "results","show","studies","study","two","use",
                                                  "using","rights","reserved","new","analysis","three",
                                                  "associated","firm","firms","copyright","sons","john","ltd","wiley"))

### Adding metadata from the original database
# This needs to be done because transforming things into a corpus only uses the texts.
    i = 0
    corpusclean = tm_map(corpusclean, function(x) {
      i <<- i +1
      meta(x, "id") = as.character(data[i,"ID"])
      x
    })

    i = 0
    corpusclean = tm_map(corpusclean, function(x) {
      i <<- i +1
      meta(x, "journal") = as.character(data[i,"SO"])
      x
    })

# The above is a loop that goes through all files ("i") in the corpus
# and then maps the information of the metadata dataframe 
# (the "ID" column, et cetera) to a new piece of metadata in the corpus
# which we also call "id", et cetera.

# This enables making selections of the corpus based on metadata now.
# Let's say we want to only look at articles from the AMJ, then we do the following:
    keep = meta(corpusclean, "journal") == "ACADEMY OF MANAGEMENT JOURNAL"
    corpus.AMJ = corpusclean[keep]

# We then convert the corpus to a "Document-term-matrix" (dtm)
    dtm =DocumentTermMatrix(corpusclean)  
    dtm

<<DocumentTermMatrix (documents: 1530, terms: 11744)>>
Non-/sparse entries: 89755/17878565
Sparsity           : 100%
Maximal term length: 30
Weighting          : term frequency (tf)

# dtms are organized with rows being documents and columns being the unique words.
# We can see here that the longest word in the corpus is 30 characters long.
# There are 1530 documents, containing 11744 unique words.

# Let's check out the sixth and seventh abstract in our data (rows in the DTM) and the 4000th to 4010th words:
    inspect(dtm[6:7,4000:4010])

    Terms
Docs  facilitation facilitative facilitators facilities facility facing fact factbased factions facto
  ID6            0            0            0          0        0      1    0         0        0     0
  ID7            0            0            0          0        0      0    1         0        0     0

# Abstract six contains "facing", once. Abstract seven contains "fact" once.

# The step below is done to ensure that after removing various words, no documents are left empty 
# (LDA does not know how to deal with empty documents). 
    rowTotals = apply(dtm , 1, sum)
# This sums up the total number of words in each of the documents, e.g.:
    rowTotals[1:10]
# shows the number of words for the first ten abstracts.

 ID1  ID2  ID3  ID4  ID5  ID6  ID7  ID8  ID9 ID10 
  40   73   85  111  103   98   97   91   82  124



# Then, we keep only those documents where the sum of words is greater than zero.
    dtm = dtm[rowTotals> 0, ]
    dtm
# Shows no abstracts were lost due to our cleaning.

<<DocumentTermMatrix (documents: 1530, terms: 11744)>>
Non-/sparse entries: 89755/17878565
Sparsity           : 100%
Maximal term length: 30
Weighting          : term frequency (tf)

Infrequent words and frequent words

# Next, we will assess which words are most frequent:
    highfreq500 = findFreqTerms(dtm,500,Inf)
# This creates a vector containing words from the dtm that occur 500 or more time (100 to infinity times)
# In the top-right window, we can see that there are six words occurring more than 500 times.
# Let's see what words these are:
    highfreq500

[1] "knowledge"      "organizational" "organizations"  "performance"    "social"        
[6] "theory"

# We can create a smaller dtm that makes the following two selections on the words in the corpus:
# This greatly saves on computing time, but infrequent words may also provide valuable information,
# so one needs to be careful when selecting cut-off values.
# 1) Keep only those words that occur more than 50 times.
    minoccur = 50
# 2) Keep only those words that occur in at least 10 of the documents. 
    mindocs = 10
# Note that this is completed on the corpus, not the DTM. 
    smalldtm = DocumentTermMatrix(corpusclean, control=list(dictionary = findFreqTerms(dtm,minoccur,Inf), 
                                                                  bounds = list(global = c(mindocs,Inf))))

    rowTotals = apply(smalldtm , 1, sum)
    smalldtm   = smalldtm[rowTotals> 0, ]
    smalldtm

<<DocumentTermMatrix (documents: 1530, terms: 518)>>
Non-/sparse entries: 41211/751329
Sparsity           : 95%
Maximal term length: 19
Weighting          : term frequency (tf)

# This reduces the number of words to 518 (from 11744, so a very large reduction).
# No abstracts are removed, however.

LDA: Running the model

# We first fix the random seed for future replication.
    SEED = 123456789

# Here we define the number of topics to be estimated. I find fifty provides decent results, while much lower 
# leads to messy topics with little variation. 
# However, little theory or careful investigation went into this so be wary.
    k = 200

# We then create a variable which captures the starting time of this particular model.
    t1_LDA200 = Sys.time()
# And then we run a LDA model with 200 topics (k = 200).
# Note that the input is the dtm
    LDA200 = LDA(dtm, k = k, control = list(seed = SEED))
# The default command uses the VEM algorithm, but an alternative is Gibbs sampling (see the documentation of the topicmodels package)
# And we create a variable capturing the end time of this model.
    t2_LDA200 = Sys.time()

# We can then check the time difference to see how long the model took. 
    t2_LDA200 - t1_LDA200

Time difference of 23.0202 mins

    k2 = 20
# We then create a variable which captures the starting time of this particular model.
    t1_LDA20 = Sys.time()
    LDA20 = LDA(smalldtm, k = k2, control = list(seed = SEED))
    t2_LDA20 = Sys.time()

    t2_LDA20 - t1_LDA20

Time difference of 11.29924 secs

LDA: The output

# We then create a variable that captures the top ten terms assigned to the 15-topic model:
    topics_LDA200 = terms(LDA200, 10)

# We can write the results of the topics to a .csv file as follows:
# write.table(topics_LDA200, file = "200_topics", sep=',',row.names = FALSE)
# This writes to the directory of the .R script, but the 'file = ' can be changed to any directory.

# And show the results:
    topics_LDA200

      Topic 1         Topic 2          Topic 3     Topic 4        Topic 5        Topic 6       
 [1,] "efforts"       "social"         "family"    "experience"   "embeddedness" "ceo"         
 [2,] "organizations" "exchange"       "venture"   "learning"     "network"      "ceos"        
 [3,] "activities"    "justice"        "control"   "performance"  "knowledge"    "support"     
 [4,] "work"          "perceptions"    "markets"   "prior"        "countries"    "corporate"   
 [5,] "behavioral"    "behavior"       "ownership" "relational"   "transfer"     "management"  
 
      Topic 7          Topic 8           Topic 9         Topic 10        Topic 11        
 [1,] "control"        "relationship"    "incentives"    "cognitive"     "communication" 
 [2,] "organizational" "strategy"        "value"         "capabilities"  "processes"     
 [3,] "attention"      "country"         "network"       "processes"     "process"       
 [4,] "organizations"  "partners"        "relationships" "control"       "coordination"  
 [5,] "technology"     "characteristics" "formal"        "managerial"    "organizational"
 
      Topic 12      Topic 13     Topic 14        Topic 15       Topic 16         Topic 17      
 [1,] "strategic"   "market"     "events"        "team"         "change"         "knowledge"   
 [2,] "knowledge"   "value"      "across"        "creativity"   "institutional"  "innovation"  
 [3,] "public"      "growth"     "influence"     "teams"        "field"          "external"    
 [4,] "political"   "negative"   "positive"      "individual"   "actors"         "creativity"  
 [5,] "private"     "will"       "focus"         "member"       "organizational" "ties"        
 
      Topic 18              Topic 19        Topic 20         Topic 21       Topic 22    
 [1,] "organizational"      "exit"          "team"           "performance"  "technology"
 [2,] "organizations"       "work"          "structure"      "social"       "knowledge" 
 [3,] "prior"               "analysts"      "exchange"       "diversity"    "initial"   
 [4,] "search"              "boundary"      "learning"       "relationship" "learning"  
 [5,] "relationship"        "types"         "relationship"   "team"         "capacity"  
 
      Topic 23        Topic 24      Topic 25         Topic 26       Topic 27     Topic 28        
 [1,] "process"       "innovation"  "control"        "target"       "experience" "organizational"
 [2,] "organizations" "corporate"   "organizational" "acquisition"  "role"       "units"         
 [3,] "attention"     "network"     "acquisitions"   "acquisitions" "process"    "unit"          
 [4,] "activities"    "product"     "performance"    "market"       "groups"     "strategies"    
 [5,] "entrepreneurs" "industry"    "managerial"     "focal"        "knowledge"  "relationship"  
 
      Topic 29        Topic 30     Topic 31         Topic 32     Topic 33      Topic 34     
 [1,] "knowledge"     "collective" "employees"      "network"    "ties"        "performance"
 [2,] "innovation"    "community"  "work"           "ties"       "networks"    "exploration"
 [3,] "challenges"    "management" "employee"       "networks"   "choices"     "search"     
 [4,] "many"          "context"    "time"           "social"     "executives"  "problem"    
 [5,] "form"          "knowledge"  "related"        "structure"  "make"        "information"
 
      Topic 35         Topic 36      Topic 37     Topic 38           Topic 39       
 [1,] "attention"      "mechanisms"  "media"      "employee"         "performance"  
 [2,] "theory"         "theoretical" "voice"      "entrepreneurship" "relationships"
 [3,] "organizations"  "different"   "product"    "knowledge"        "ceo"          
 [4,] "organizational" "orientation" "managerial" "events"           "influence"    
 [5,] "processes"      "strategies"  "control"    "behavior"         "types"        
 
      Topic 40       Topic 41         Topic 42         Topic 43         Topic 44        
 [1,] "social"       "leadership"     "events"         "mechanisms"     "organizational"
 [2,] "performance"  "behaviors"      "organizational" "organizations"  "theory"        
 [3,] "justice"      "leaders"        "work"           "theory"         "routines"      
 [4,] "diversity"    "work"           "framework"      "agency"         "business"      
 [5,] "corporate"    "performance"    "creative"       "organizational" "capabilities"  
 
      Topic 45      Topic 46         Topic 47          Topic 48         Topic 49       
 [1,] "integration" "knowledge"      "strategy"        "learning"       "innovation"   
 [2,] "capital"     "transfer"       "fit"             "organizational" "technology"   
 [3,] "boundaries"  "network"        "past"            "value"          "network"      
 [4,] "process"     "boundary"       "innovation"      "theory"         "organizations"
 [5,] "innovation"  "organizational" "develop"         "processes"      "networks"     
 
      Topic 50       Topic 51        Topic 52        Topic 53      Topic 54        
 [1,] "market"       "performance"   "ties"          "networks"    "identity"      
 [2,] "governance"   "product"       "logics"        "network"     "identities"    
 [3,] "capabilities" "strategic"     "building"      "groups"      "organizational"
 [4,] "companies"    "development"   "market"        "group"       "work"          
 [5,] "executives"   "case"          "categories"    "ties"        "individuals"   
 
      Topic 55         Topic 56         Topic 57         Topic 58           Topic 59        
 [1,] "organizational" "social"         "complexity"     "uncertainty"      "strategy"      
 [2,] "risk"           "theory"         "task"           "different"        "affect"        
 [3,] "social"         "models"         "organizational" "choice"           "management"    
 [4,] "making"         "organizations"  "degree"         "governance"       "top"           
 [5,] "perspective"    "institutional"  "complex"        "structure"        "role"          
 
      Topic 60      Topic 61        Topic 62     Topic 63          Topic 64        Topic 65       
 [1,] "feedback"    "entry"         "power"      "team"            "institutional" "different"    
 [2,] "performance" "industry"      "source"     "entrepreneurial" "local"         "argue"        
 [3,] "creative"    "technologies"  "ownership"  "teams"           "logics"        "performance"  
 [4,] "workers"     "power"         "theory"     "changes"         "search"        "among"        
 [5,] "internal"    "business"      "management" "products"        "investment"    "private"      
 
      Topic 66       Topic 67        Topic 68      Topic 69      Topic 70     Topic 71        
 [1,] "performance"  "performance"   "activities"  "performance" "work"       "information"   
 [2,] "coordination" "social"        "performance" "leader"      "change"     "social"        
 [3,] "team"         "capabilities"  "learning"    "negative"    "action"     "status"        
 [4,] "teams"        "quality"       "activity"    "actions"     "mechanisms" "within"        
 [5,] "management"   "status"        "patterns"    "theory"      "practices"  "individuals"   
 
      Topic 72       Topic 73         Topic 74        Topic 75      Topic 76         Topic 77     
 [1,] "search"       "voice"          "institutional" "women"       "organizations"  "decision"   
 [2,] "likelihood"   "workplace"      "institutions"  "men"         "organizational" "making"     
 [3,] "engagement"   "employees"      "business"      "gender"      "success"        "ethical"    
 [4,] "joint"        "outcomes"       "countries"     "social"      "framework"      "group"      
 [5,] "models"       "identification" "ownership"     "differences" "strategy"       "process"    
 
      Topic 78      Topic 79        Topic 80        Topic 81       Topic 82         
 [1,] "turnover"    "design"        "communication" "job"          "pay"            
 [2,] "performance" "products"      "team"          "satisfaction" "managers"       
 [3,] "job"         "technological" "members"       "relationship" "corporate"      
 [4,] "employees"   "product"       "managers"      "turnover"     "social"         
 [5,] "theory"      "industry"      "teams"         "individual"   "discuss"        
 
      Topic 83           Topic 84      Topic 85         Topic 86         Topic 87    
 [1,] "logic"            "employees"   "organizational" "institutional"  "alliances" 
 [2,] "field"            "performance" "employee"       "differences"    "knowledge" 
 [3,] "organizations"    "likely"      "employees"      "environments"   "governance"
 [4,] "social"           "negative"    "individuals"    "theory"         "reputation"
 [5,] "logics"           "job"         "leadership"     "legitimacy"     "alliance"  
 
      Topic 88      Topic 89     Topic 90        Topic 91     Topic 92        Topic 93     
 [1,] "market"      "industry"   "market"        "directors"  "entrepreneurs" "team"       
 [2,] "ties"        "corporate"  "product"       "board"      "information"   "teams"      
 [3,] "value"       "ventures"   "complementary" "boards"     "product"       "members"    
 [4,] "exit"        "venture"    "knowledge"     "corporate"  "search"        "time"       
 [5,] "form"        "reputation" "exploration"   "governance" "actions"       "motivation" 
 
      Topic 94          Topic 95         Topic 96        Topic 97              Topic 98      
 [1,] "theory"          "organizational" "creative"      "learning"            "decision"    
 [2,] "practice"        "theory"         "decisions"     "foreign"             "makers"      
 [3,] "characteristics" "executives"     "managers"      "entry"               "performance" 
 [4,] "process"         "employee"       "strategic"     "industry"            "information" 
 [5,] "organizational"  "units"          "relational"    "interorganizational" "strategic"   
 
      Topic 99         Topic 100     Topic 101   Topic 102        Topic 103       Topic 104    
 [1,] "organizational" "ceos"        "parties"   "corporate"      "capability"    "projects"   
 [2,] "work"           "ceo"         "third"     "organizational" "capabilities"  "within"     
 [3,] "trust"          "will"        "positive"  "managers"       "dynamic"       "theory"     
 [4,] "institutional"  "performance" "perceived" "attention"      "development"   "investments"
 [5,] "peers"          "theory"      "social"    "industries"     "resources"     "economic"   
 
      Topic 105      Topic 106         Topic 107        Topic 108         Topic 109    
 [1,] "work"         "performance"     "logics"         "performance"     "networks"   
 [2,] "interactions" "costs"           "institutional"  "managers"        "network"    
 [3,] "online"       "diversification" "organizational" "among"           "social"     
 [4,] "team"         "search"          "different"      "decisions"       "entry"      
 [5,] "coordination" "market"          "outcomes"       "projects"        "structure"  
 
      Topic 110     Topic 111          Topic 112        Topic 113        Topic 114      
 [1,] "performance" "industry"         "theory"         "attention"      "women"        
 [2,] "status"      "behavioral"       "management"     "organizational" "gender"       
 [3,] "groups"      "learning"         "theories"       "evolution"      "men"          
 [4,] "group"       "subsequent"       "framework"      "capabilities"   "employees"    
 [5,] "work"        "acquisition"      "approach"       "management"     "psychological"
 
      Topic 115    Topic 116         Topic 117         Topic 118        Topic 119       
 [1,] "innovation" "work"            "diversification" "development"    "organizations" 
 [2,] "industry"   "framework"       "international"   "organizational" "community"     
 [3,] "business"   "dimensions"      "relationship"    "justice"        "relationships" 
 [4,] "capital"    "within"          "financial"       "routines"       "organizational"
 [5,] "target"     "understanding"   "product"         "time"           "social"        
 
      Topic 120    Topic 121        Topic 122        Topic 123        Topic 124       
 [1,] "alliance"   "status"         "status"         "decisions"      "knowledge"     
 [2,] "alliances"  "performance"    "market"         "regulatory"     "performance"   
 [3,] "partners"   "negative"       "positive"       "institutional"  "capabilities"  
 [4,] "prior"      "individuals"    "organization"   "differences"    "external"      
 [5,] "within"     "social"         "reputation"     "foreign"        "learning"      
 
      Topic 125       Topic 126             Topic 127     Topic 128        Topic 129        
 [1,] "institutional" "exchange"            "csr"         "learning"       "conflict"       
 [2,] "capabilities"  "partners"            "ceos"        "organizations"  "focus"          
 [3,] "reputation"    "relationships"       "political"   "organizational" "experience"     
 [4,] "complexity"    "social"              "corporate"   "collective"     "regulatory"     
 [5,] "decision"      "prior"               "stakeholder" "status"         "process"        
 
      Topic 130        Topic 131       Topic 132     Topic 133      Topic 134     Topic 135  
 [1,] "performance"    "social"        "performance" "capabilities" "innovation"  "target"   
 [2,] "decision"       "women"         "approaches"  "market"       "performance" "strategy" 
 [3,] "makers"         "men"           "resources"   "markets"      "financial"   "first"    
 [4,] "context"        "organization"  "competitive" "likely"       "knowledge"   "corporate"
 [5,] "employees"      "interactions"  "empirical"   "costs"        "changes"     "type"     
 
      Topic 136      Topic 137        Topic 138   Topic 139      Topic 140       Topic 141       
 [1,] "family"       "capital"        "alliance"  "ideas"        "work"          "corporate"     
 [2,] "strategic"    "human"          "alliances" "diversity"    "control"       "governance"    
 [3,] "behavioral"   "categories"     "partners"  "investments"  "system"        "stakeholders"  
 [4,] "management"   "market"         "learning"  "group"        "resources"     "organizational"
 [5,] "strategy"     "employee"       "value"     "creative"     "organizations" "framework"     
 
      Topic 142     Topic 143       Topic 144         Topic 145       Topic 146      
 [1,] "work"        "groups"        "teams"           "behavioral"    "across"       
 [2,] "support"     "power"         "services"        "family"        "external"     
 [3,] "develop"     "financial"     "strategic"       "superior"      "collaboration"
 [4,] "cognitive"   "group"         "diverse"         "strategy"      "practices"    
 [5,] "role"        "business"      "diversity"       "performance"   "levels"       
 
      Topic 147     Topic 148        Topic 149        Topic 150       Topic 151        
 [1,] "performance" "ethical"        "acquisition"    "focus"         "analysts"       
 [2,] "members"     "moral"          "growth"         "international" "diversification"
 [3,] "collective"  "organizations"  "risk"           "ceos"          "competitive"    
 [4,] "family"      "influence"      "market"         "performance"   "future"         
 [5,] "individual"  "leadership"     "test"           "early"         "status"         
 
      Topic 152     Topic 153       Topic 154        Topic 155      Topic 156       
 [1,] "social"      "job"           "cultural"       "field"        "knowledge"     
 [2,] "individuals" "search"        "organizational" "theory"       "management"    
 [3,] "role"        "jobs"          "organization"   "innovations"  "learning"      
 [4,] "influence"   "performance"   "practices"      "capabilities" "organizational"
 [5,] "managers"    "organizations" "organizations"  "capital"      "routines"      
 
      Topic 157       Topic 158     Topic 159     Topic 160        Topic 161     Topic 162     
 [1,] "opportunities" "business"    "options"     "technology"     "types"       "social"      
 [2,] "process"       "diversity"   "performance" "learning"       "positive"    "markets"     
 [3,] "sources"       "positive"    "technology"  "organizational" "concept"     "online"      
 [4,] "within"        "core"        "logic"       "experiences"    "better"      "behaviors"   
 [5,] "external"      "managerial"  "markets"     "knowledge"      "used"        "community"   
 
      Topic 163      Topic 164      Topic 165      Topic 166       Topic 167     Topic 168   
 [1,] "development"  "performance"  "ceo"          "decisions"     "values"      "jobs"      
 [2,] "power"        "social"       "compensation" "time"          "performance" "dependence"
 [3,] "technologies" "companies"    "directors"    "experience"    "work"        "power"     
 [4,] "positive"     "economic"     "outside"      "influence"     "management"  "job"       
 [5,] "address"      "relationship" "risk"         "communication" "identity"    "theory"    
 
      Topic 169          Topic 170        Topic 171        Topic 172       Topic 173       
 [1,] "institutional"    "fit"            "organizational" "network"       "performance"   
 [2,] "entrepreneurship" "performance"    "adoption"       "collaboration" "organizational"
 [3,] "organizational"   "services"       "practice"       "actors"        "organizations" 
 [4,] "innovation"       "institutional"  "status"         "knowledge"     "individuals"   
 [5,] "analysts"         "products"       "organizations"  "social"        "innovation"    
 
      Topic 174     Topic 175       Topic 176     Topic 177       Topic 178        Topic 179    
 [1,] "resource"    "environmental" "decision"    "activities"    "organizational" "benefits"   
 [2,] "market"      "teams"         "knowledge"   "dependence"    "managers"       "strategy"   
 [3,] "performance" "team"          "decisions"   "investment"    "identification" "context"    
 [4,] "governance"  "performance"   "making"      "communication" "members"        "mechanisms" 
 [5,] "political"   "information"   "foreign"     "social"        "social"         "performance"
 
      Topic 180     Topic 181       Topic 182       Topic 183        Topic 184    
 [1,] "leadership"  "resources"     "team"          "strategic"      "business"   
 [2,] "leaders"     "institutional" "creative"      "actions"        "systems"    
 [3,] "theory"      "market"        "innovation"    "network"        "existing"   
 [4,] "behavior"    "complementary" "development"   "actors"         "relational" 
 [5,] "leader"      "context"       "knowledge"     "organizational" "investment" 
 
      Topic 185         Topic 186        Topic 187       Topic 188        Topic 189       
 [1,] "employees"       "organizational" "relationship"  "organizational" "change"        
 [2,] "behaviors"       "organizations"  "political"     "innovation"     "strategic"     
 [3,] "opportunities"   "forms"          "psychological" "technology"     "organizational"
 [4,] "opportunity"     "practices"      "need"          "technological"  "develop"       
 [5,] "voice"           "organization"   "professional"  "will"           "distinct"      
 
      Topic 190     Topic 191     Topic 192       Topic 193      Topic 194        Topic 195       
 [1,] "value"       "market"      "work"          "performance"  "identity"       "team"          
 [2,] "performance" "competitive" "ties"          "business"     "organizational" "members"       
 [3,] "competitive" "industry"    "relationships" "unit"         "identification" "among"         
 [4,] "strategy"    "advantage"   "strategies"    "leadership"   "social"         "performance"   
 [5,] "advantage"   "resource"    "outcomes"      "team"         "organizations"  "teams"         
 
      Topic 196      Topic 197        Topic 198     Topic 199       Topic 200    
 [1,] "influence"    "groups"         "group"       "resources"     "knowledge"  
 [2,] "political"    "online"         "routines"    "resource"      "field"      
 [3,] "association"  "routines"       "ties"        "portfolio"     "practice"   
 [4,] "performance"  "competition"    "identity"    "value"         "personal"   
 [5,] "public"       "innovation"     "groups"      "communication" "differences"
 
# We can also look at the 20-topic model's topics: 
topics_LDA20 = terms(LDA20, 10)
# And show the results:
topics_LDA20

# How to show term weights:
    word_assignments200 <- t(posterior(LDA200)[["terms"]])
    word_assignments200[1:10,1:10]

                         1             2             3             4             5             6             7             8             9            10
ability      1.003658e-110  2.402186e-03 4.524415e-119  2.529326e-03  4.519954e-03  2.650128e-26 2.189112e-217 5.126056e-119  7.675248e-03  3.162048e-03
access       2.890976e-119 1.803954e-119 6.775001e-267 1.872564e-119  2.681844e-80 1.362547e-119  2.844907e-03 2.806136e-119 1.093941e-305 1.889166e-110
account       6.454202e-03  2.402186e-03 2.525859e-119  3.720076e-44 2.064195e-119 1.389545e-119  3.713690e-88 2.861738e-119 3.174732e-119 2.160706e-119
acquisition  2.260776e-119 1.410712e-119 3.309212e-267 1.464367e-119 3.486398e-107 4.915035e-220 1.670706e-119 2.194431e-119 2.434440e-119 1.656867e-119
acquisitions 1.453408e-119 1.882662e-194 1.245177e-119 2.032038e-194  2.941731e-03 2.026206e-311 2.670178e-194 1.410756e-119 1.565053e-119 1.065277e-119
across        8.255325e-03  1.357002e-29  4.557410e-03 9.262651e-222  9.999588e-03 3.090933e-119  1.267400e-44  3.736716e-03 4.189305e-123  2.172829e-03
action        3.849690e-03 1.987888e-119 1.709013e-193 2.063494e-119 2.230468e-119 1.501475e-119  2.420920e-67 3.092255e-119  2.743137e-82  9.900292e-58
actions      4.556416e-119  6.600636e-83 3.903614e-119 2.951316e-119  2.640163e-03 2.147487e-119  1.733362e-76  7.671477e-03 4.906422e-119 1.609519e-132
activities    2.136604e-02 2.440501e-119 3.350746e-119  7.728052e-03 2.738313e-119 4.171621e-123  5.894147e-03  3.736716e-03 4.211526e-119  1.462414e-02
activity      3.945952e-87 1.705369e-119 2.341428e-119  2.117941e-02 1.913474e-119 1.288085e-119 3.773279e-267  3.124542e-93 2.942923e-119 9.183462e-194

# Let's now create a file containing the topic loadings for all articles:
    gammaDF_LDA200 = as.data.frame(LDA200@gamma) 
# This creates a dataframe containing for every row the articles and for every column the per-topic loading.
    gammaDF_LDA200$ID = smalldtm$dimnames$Docs
# We add the ID from the metadata for merging with the metadata file. Of course
# any other type of data can be added.

# If we are not necessarily interested in using the full range of topic loadings,
# but only in keeping those loadings that exceed a certain threshold, 
# then we can use the code below:
    majortopics = topics(LDA200, threshold = 0.3)
    majortopics = as.data.frame(vapply(majortopics, 
       paste, collapse = ", ", character(1L)))
    colnames(majortopics) = "topic"   
    majortopics$topic = sub("^$", 0, majortopics$topic)
     
# Here, we state that we want to show all topics that load greater than 0.3, per paper.
# Of course, the higher the threshold, the fewer topics will be selected per paper.
# The flattening (the second and third line) is done to clean this column up (from e.g. "c(1,5,7)" to "1,5,7")
# The fourth line replaces those without a topic with the value zero.
# The last line renames the column.

# Some abstracts may have no topics assigned to them.
# NOTE: It will report the topics in a sequential manner - the first topic in this list is not
# necessarily the most important topic.   

# We can also select the highest loading topic for every paper by changing the threshold-subcommand
# to k = (k refers to the number of highest loading topics per paper):
    highest = as.data.frame(data$SO)
# I first make a column containing the journal, since we're going to do some follow-up checks on this dataframe. 
    highest$maintopic = topics(LDA200, k = 1)
# I then add the highest loading topic of each abstract. We can do this because the order of the data is identical. 
# Otherwise, we'd need to match the two using the ID variable (e.g. if some abstracts had been removed due to cleaning).

Plotting topic usage across journals and over time

# We cross-tabulate journals and highest loading topics
    crosstabtable = table(highest)
    crosstabtable
    
# The following topics are most distinctive for each journal:
# AMJ: Topic 69  (distance = 6.5) --> performance, leader, negative, actions, theory
# AMR: Topic 56  (distance = 5.5) --> social, theory, models, organizations, institutional
# ASQ: Topic 100 (distance = 2) --> ceos, ceo, will, performance, theory
# OS:  Topic 32  (distance = 8.25) --> network, ties, networks, social, structure
# SMJ: Topic 133 (distance = 10) --> capabilities, market, markets, likely, costs

                                  maintopic
data$SO                             1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
  ACADEMY OF MANAGEMENT JOURNAL     1  3  1  0  3  4  3  0  1  1  1  0  2  0  4  6  0  0  1  1
  ACADEMY OF MANAGEMENT REVIEW      2  1  0  0  1  0  0  0  0  4  0  0  0  1  0  0  0  1  2  1
  ADMINISTRATIVE SCIENCE QUARTERLY  0  1  1  0  2  3  0  0  0  0  0  0  0  1  0  0  1  2  0  0
  ORGANIZATION SCIENCE              4  4  2  4  2  0  4  1  2  3  5  3  0  1  0  4  5  4  3  2
  STRATEGIC MANAGEMENT JOURNAL      0  0  6  3  2  9  1  6  5  5  2  2  5  3  1  2  2  1  1  0
                                  maintopic
data$SO                            21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
  ACADEMY OF MANAGEMENT JOURNAL     4  0  2  2  2  0  1  2  0  3  7  0  2  0  2  7  3  0  0  2
  ACADEMY OF MANAGEMENT REVIEW      0  0  1  0  0  0  0  0  2  1  0  0  1  0  1  1  0  2  1  1
  ADMINISTRATIVE SCIENCE QUARTERLY  0  0  0  0  1  2  1  1  0  0  1  0  2  0  2  0  0  0  0  0
  ORGANIZATION SCIENCE              1  3  5  2  1  0  1  2  7  2  5  9  5  3  4  2  1  2  2  2
  STRATEGIC MANAGEMENT JOURNAL      3  3  1  6  2  6  1  2  1  1  0  3  2  4  2  1  0  3  3  4
                                  maintopic
data$SO                            41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
  ACADEMY OF MANAGEMENT JOURNAL     5  3  5  2  1  2  0  1  0  0  1  2  3  5  0  2  0  0  2  3
  ACADEMY OF MANAGEMENT REVIEW      1  0  0  0  0  0  0  1  0  0  0  1  1  5  0  6  2  1  0  0
  ADMINISTRATIVE SCIENCE QUARTERLY  0  0  1  0  2  0  0  1  1  2  0  0  0  0  1  0  0  0  0  1
  ORGANIZATION SCIENCE              1  4  3  3  3  3  4  3  2  3  4  3  2  3  5  0  1  1  2  1
  STRATEGIC MANAGEMENT JOURNAL      1  1  1  2  1  2  4  2  3  2  6  1  1  0  2  0  0  6  3  1
                                  maintopic
data$SO                            61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
  ACADEMY OF MANAGEMENT JOURNAL     2  2  2  2  3  4  2  1  7  3  1  1  2  3  3  0  3  6  0  1
  ACADEMY OF MANAGEMENT REVIEW      0  2  1  0  1  0  1  1  0  1  1  0  1  0  0  2  1  2  1  0
  ADMINISTRATIVE SCIENCE QUARTERLY  0  0  1  1  0  0  3  0  0  0  0  2  0  1  1  0  1  1  0  0
  ORGANIZATION SCIENCE              2  1  1  1  2  4  0  6  1  2  4  2  3  0  2  3  4  3  2  4
  STRATEGIC MANAGEMENT JOURNAL      3  3  2  2  4  3  4  2  1  3  2  1  0  3  0  2  0  1  5  0
                                  maintopic
data$SO                            81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
  ACADEMY OF MANAGEMENT JOURNAL     6  0  5  4  0  0  0  1  1  1  2  2  3  0  0  3  2  1  2
  ACADEMY OF MANAGEMENT REVIEW      0  0  5  0  1  3  0  1  0  0  0  1  1  4  1  1  0  1  1
  ADMINISTRATIVE SCIENCE QUARTERLY  0  0  0  0  0  0  0  1  0  0  1  0  0  0  1  0  0  0  0
  ORGANIZATION SCIENCE              1  3  0  1  2  1  2  1  2  4  3  2  2  2  3  2  4  2  2
  STRATEGIC MANAGEMENT JOURNAL      1  4  0  3  1  4  3  0  5  1  4  1  1  1  1  1  6  7  0
                                  maintopic
data$SO                            100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
  ACADEMY OF MANAGEMENT JOURNAL      2   1   2   0   2   0   1   2   1   2   2   2   1   0   1
  ACADEMY OF MANAGEMENT REVIEW       0   1   0   0   0   1   1   0   0   0   0   0   5   2   0
  ADMINISTRATIVE SCIENCE QUARTERLY   5   0   0   0   0   0   0   1   0   1   0   1   0   0   1
  ORGANIZATION SCIENCE               2   3   3   3   5   3   1   3   2   1   5   0   1   3   1
  STRATEGIC MANAGEMENT JOURNAL       8   0   2   3   2   1   5   1   1   1   2   3   1   2   2
                                  maintopic
data$SO                            115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
  ACADEMY OF MANAGEMENT JOURNAL      1   0   1   3   3   1   3   1   0   2   1   2   0   0   2
  ACADEMY OF MANAGEMENT REVIEW       0   2   0   1   1   0   0   1   0   0   1   0   0   2   1
  ADMINISTRATIVE SCIENCE QUARTERLY   1   0   0   2   3   0   0   3   2   0   0   0   1   0   0
  ORGANIZATION SCIENCE               0   2   4   1   0   3   4   2   6   1   1   2   1   5   1
  STRATEGIC MANAGEMENT JOURNAL       4   0   6   1   1   3   0   0   1   4   3   1   4   0   4
                                  maintopic
data$SO                            130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
  ACADEMY OF MANAGEMENT JOURNAL      3   0   2   0   0   1   1   1   1   0   0   1   2   0   1
  ACADEMY OF MANAGEMENT REVIEW       0   0   0   0   0   0   0   2   0   0   1   2   2   1   0
  ADMINISTRATIVE SCIENCE QUARTERLY   0   0   0   0   0   2   0   0   0   2   0   0   0   1   0
  ORGANIZATION SCIENCE               2   4   3   0   2   0   1   5   3   3   1   2   2   2   4
  STRATEGIC MANAGEMENT JOURNAL       1   0   7  10   8   2   9   1   5   5   1   2   1   2   1
                                  maintopic
data$SO                            145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
  ACADEMY OF MANAGEMENT JOURNAL      0   1   1   4   1   1   3   2   3   1   0   1   2   1   1
  ACADEMY OF MANAGEMENT REVIEW       0   0   0   1   0   0   1   1   0   2   1   0   0   1   1
  ADMINISTRATIVE SCIENCE QUARTERLY   0   0   0   0   1   0   0   1   0   0   1   0   0   0   0
  ORGANIZATION SCIENCE               3   7   3   2   3   1   1   3   3   7   0   5   3   1   1
  STRATEGIC MANAGEMENT JOURNAL       3   2   2   2   3   3   3   2   0   1   4   1   2   4   1
                                  maintopic
data$SO                            160 161 162 163 164 165 166 167 168 169 170 171 172 173 174
  ACADEMY OF MANAGEMENT JOURNAL      2   2   0   2   3   3   3   1   2   0   0   3   3   0   1
  ACADEMY OF MANAGEMENT REVIEW       0   0   1   0   0   0   1   1   2   1   0   1   0   0   0
  ADMINISTRATIVE SCIENCE QUARTERLY   1   2   0   1   0   1   1   2   3   1   0   1   0   0   0
  ORGANIZATION SCIENCE               3   1   3   2   2   4   2   1   2   2   3   4   3   6   0
  STRATEGIC MANAGEMENT JOURNAL       1   1   1   3   1   8   0   0   2   3   2   1   0   0  10
                                  maintopic
data$SO                            175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
  ACADEMY OF MANAGEMENT JOURNAL      2   2   2   2   1   4   1   1   1   0   3   3   2   1   2
  ACADEMY OF MANAGEMENT REVIEW       1   0   0   0   1   1   0   0   0   2   1   0   1   1   1
  ADMINISTRATIVE SCIENCE QUARTERLY   0   0   0   2   0   0   0   1   0   0   0   0   1   2   0
  ORGANIZATION SCIENCE               1   1   1   1   3   1   2   1   2   2   3   4   1   3   3
  STRATEGIC MANAGEMENT JOURNAL       0   0   5   0   2   1   8   2   3   2   1   1   0   0   2
                                  maintopic
data$SO                            190 191 192 193 194 195 196 197 198 199 200
  ACADEMY OF MANAGEMENT JOURNAL      0   3   2   4   0   3   0   2   1   1   2
  ACADEMY OF MANAGEMENT REVIEW       2   0   1   0   5   0   1   0   0   0   2
  ADMINISTRATIVE SCIENCE QUARTERLY   0   0   1   0   0   1   0   0   1   0   0
  ORGANIZATION SCIENCE               1   3   2   1   2   1   2   4   2   1   4
  STRATEGIC MANAGEMENT JOURNAL       9   6   0   5   1   2   4   1   0   6   2

# And create a barplot (first line) with ticks at every X value (second line)
    bar1 <- barplot(crosstabtable,legend.text =  c("AMJ", "AMR","ASQ","OS","SMJ"),col = c("gray0","gray20","gray60","gray80","gray100"), axisnames=FALSE)
    axis(3,at=bar1,labels=seq(1,200,by=1))

# We can also check only for the SMJ, for example.
    bar2 <-barplot(crosstabtable["ACADEMY OF MANAGEMENT JOURNAL",], axisnames=FALSE)
    axis(3,at=bar2,labels=seq(1,200,by=1))

# We can take a similar approach to look at trends over time.
    highest_year = as.data.frame(data$PY)
    highest_year$maintopic = topics(LDA200, k = 1)

    crosstabtable_year = table(highest_year)
    bar3 <-barplot(crosstabtable_year,legend.text =  c("2011", "2012","2013","2014","2015"),col = c("gray0","gray20","gray60","gray80","gray100"), axisnames=FALSE)
    axis(3,at=bar3,labels=seq(1,200,by=1))

    bar4 <-barplot(crosstabtable[1,], axisnames=FALSE)
    axis(3,at=bar4,labels=seq(1,200,by=1))