‘...automated journalism is expected to substantially increase the amount of available news. [...]
				To cope with the resulting information overload, the importance of search engines and personalized news aggregators, such as
				Google News, are likely to increase further. Search engine providers claim to analyze individual user data (e.g., location and historical
				search behavior). [As a result] different news consumers might receive different results for the same keyword searches.
				[...] personalization will lead individuals to consume more and more of the same information, as algorithms provide
				only content that users like to read or agree with. Consequently, people would be less likely to encounter information
				that challenges their views or contradicts their interests, which could carry risks for the formation of public opinion
				in a democratic society.
				
				'Guide to Automatic Journalism'—Andreas Graefe, 2016
				Text-generating tool GPT-2 is used in various services for Automated Journalism. The user of such a service is only experiencing the end result—a piece of text. In the same way as other types of technologies, where the process is unknown or unavailable for the user, it is frequently perceived as impartial and objective (since it is made by a 'machine'.) This project is an attempt to take a closer look into the very much human-created contents of the dataset, used to train GPT-2, and to expose their subjective character.
			
			

               Memes
	
times scraped
				           Fake News
 times scraped
				  Conservative Media
 times scraped
				       Liberal Media
 times scraped
				         Entertainment Media
 times scraped
				         Other Media
 times scraped
				             Science
 times scraped
				            Shopping
 times scraped
				              Gaming
 times scraped
				             Finance
 times scraped
				              Search Engines
 times scraped
				             Fiction
 times scraped
				
				
			
			
				
					qkme.me: 433189
					quickmeme.com: 257995
					livememe.com: 77424
					memegenerator.net: 19971
					makeameme.org: 17779
					m.quickmeme.com: 9607
					memedad.com: 5903
					memecrunch.com: 4785
					knowyourmeme.com: 1318
					memegen.com: 1261
					quickmeme.com: 1140
					mememaker.net: 848
					weknowmemes.com: 772
					memegenerator.co: 701
					memecreator.org: 594
					memecenter.com: 503
					memefive.com: 499
					whatsmeme.com: 373
					zipmeme.com: 311
					memeslide.com: 307
					memegur.com: 300
					slowmeme.com: 209
					memes.com: 185
					us.memebox.com: 180
					ismeme.com: 176
					memegene.net: 165
					memegeneokerlund.com: 163
					m.memegenerator.co: 136
					memeburn.com: 133
					dubaimemes.com: 130
					meme.li: 127
					memeorandum.com: 121
					memepile.com: 117
					memegrator.com: 116
					memecaptain.com: 108
					es.memegenerator.net: 98
					images.memes.com: 97
					ru.memegenerator.net: 94
					atheistmemebase.com: 86
					cartoonmeme.com: 86
					img.memecdn.com: 80
					memebox.com: 70
					memebase.com: 70
					senorgif.memebase.com: 68
					memeful.com: 60
					funmeme.com: 60
					scumbag.memegenerator.net: 55
					whatireally.memegenerator.net: 54
					techmeme.com: 54
					macromeme.com: 52
					memerush.com: 50
					memeblender.com: 50
					memerial.net: 48
					homemediamagazine.com: 47
					logicalmeme.com: 45
					homemediatech.net: 44
					memez.com: 44
					memesandcomics.com: 43
					memeshuffle.com: 42
					memecreator.net: 42
					dankmeme.net: 41
					memebox.com: 40
					memecreator.com: 38
					linuxmemes.com: 37
					images.memegenerator.net: 36
					m.memegenerator.net: 36
					motivatingmemes.net: 35
					departmentofmemes.com: 33
					ratemymeme.com: 32
					memebase.cheezburger.com: 30
					memepile.com: 30
					verydemotivational.memebase.com: 29
					whatdoumeme.com: 29
					graphjam.memebase.com: 28
					nememe.com: 27
					memecloud.co: 26
					pokememes.memebase.com: 26
					bronies.memebase.com: 25
					quick-meme.tumblr.com: 25
					memetrashblog.wordpress.com: 24
					hotmeme.net: 23
					memelite.com: 23
					refriedmemes.com: 22
					memegene.org: 22
					exmomemes.blogspot.com: 21
					meme-rage.tumblr.com: 21
					preview.images.memegenerator.net: 20
					memedaily.com: 20
					memeinnovation.com: 20
					popularmeme.com: 19
					memecreator.eu: 19
					memeshuffle.com: 19
					meme5.net: 18
					meme-city.tumblr.com: 18
					memeboat.com: 17
					memeguy.com: 17
					livememe.com: 17
					memepix.com: 17
					hsmemes.com: 17
					global3.memecdn.com: 16
					minimemes.net: 16
					memebuster.net: 16
					thoughtcrimemedia.squarespace.com: 16
					memegene.net: 16
					comixed.memebase.com: 16
					memelatte.com: 16
					meme-depot.tumblr.com: 16
					memesly.com: 16
					russ.unwashedmeme.com: 16
					memeheroes.com: 15
					wbb-memelord.tumblr.com: 15
					mememachine.viralvideochart.com: 15
					starwarsmemes.com: 15
					easymemes.com: 15
					memeWOW.com: 15
					mememerchant.blogspot.ca: 15
					memedroid.com: 14
					quickmeme.com: 14
					memeinsider.co: 14
					memecreator.net: 14
					memedroid.com: 14
					mementoexclusives.com: 14
					memewhore.tumblr.com: 14
					memeblock.com: 14
					mozillamemes.tumblr.com: 14
					catholicmemes.com: 14
					suchmemes.com: 14
					ryepup.unwashedmeme.com: 13
					su-memelord.tumblr.com: 13
					leaguememe.com: 13
					memeslap.com: 13
					genomemedicine.biomedcentral.com: 13
					rs1img.memecdn.com: 13
					catholicmemes.com: 12
					meme4u.tumblr.com: 12
					thinkchristian.reframemedia.com: 12
					meme.zenfs.com: 12
					memeviewer.com: 12
					memehunter.tumblr.com: 12
					stevensdankmemes.tumblr.com: 12
					slowmeme.com: 12
					artoftrolling.memebase.com: 12
					atcmemes.com: 12
					wantmeme.com: 11
					fantasticmemes.wordpress.com: 11
					mlbmemes.lockerdome.com: 11
					memedokis.tumblr.com: 11
					rocketmemes.com: 11
					memesarefun.tumblr.com: 11
					memealert.org: 11
					memedroid.novagecko.com: 11
					memefo.tumblr.com: 11
					meme.is-great.net: 11
				
			 
			
				
					*according to factcheck.org
															Fake news, deceptive content, misinformation and conspiracy theories, fiction, registered through a company that hides the identity and location of the owner of the website.
					americasfreedomfighters.com 393
					americanjournalreview.com: 20
					rightwingtribune.com: 28
					americatalks.com: 6
					bannedinformation.com: 4
					cbinfo24.com: 2
					channel23news.com: 6
					conservativeangle.com: 29
					consnation.com: 4
					theconservativepost.com: 1
					theconservativetreehouse.com: 1
					daily-vine.com: 4
					empirenews.net: 46
					en-volve.com: 74
					fbnewscycle.com: 60
					fellowshipoftheminds.com: 1
					gellerreport.com: 525
					infowars.com: 146
					kagfeed.com: 4
					londonwebnews.com: 4
					mminfo24.com: 2
					neonnettle.com: 277
					newspunch.com: 40
					nyeveningnews.com: 22
					the-postillon.com: 74
					puppetstringnews.com: 37
					realnewsrightnow.com: 9
					rightwingtribune.com: 28
					rwnofficial.com: 84
					therightists.com: 3
					stgeorgegazette.com: 7
					stuppid.com: 26
					teddystick.com: 22
					topalertnews.com: 2
					truthfeednews.com: 106
					www.universaleinfo.com: 5
					ussanews.com: 45
					viralcords.com: 1
					yournewswire.com: 1332
					newspunch.com: 40
					theonion.com: 1684
					entertainment.theonion.com: 225
					politics.theonion.com: 692
					sports.theonion.com: 270
				 
			 
			
				
					*according to this website
															Conservative, far-right, republican media sources
					foxnews.com: 53999
					wallstreetjournal.de: 4
					drudgereport.com: 64
					newsmax.com: 5505
					freerepublic.com: 1013
					washingtontimes.com: 467
					m.townhall.com: 446
					townhall.com: 81
					rushlimbaugh.com: 1388
					realclearpolitics.com: 46
					nationalreview.com: 9694
					hotair.com: 7224
					michellemalkin.com: 368
					glennbeck.com: 228
					humanevents.com: 701
					heritage.org: 402
					newsbusters.org: 4164
					lewrockwell.com: 6514
					weeklystandard.com: 2935
					newswithviews.com: 298
					pajamasmedia.com: 312
					americanthinker.com: 7389
					cnsnews.com: 1535
					reason.com: 1182
					lucianne.com: 20
					anncoulter.com: 256
					cato.org: 1978
					dailypaul.com: 2728
					volokh.com: 461
					billoreilly.com: 99
					redstate.com: 5609
					conservapedia.com: 341
					powerlineblog.com: 1744
					frontpagemag.com: 3369
					littlegreenfootballs.com: 461
					campaignforliberty.org: 114
					spectator.org: 2441
					gopusa.com: 156
					lileks.com: 34
					rightwingnews.com: 486
					wizbangblog.com: 51
					moonbattery.com: 357
				 
			 
			
				
					*according to this website
															Liberal, left-wing, democratic media sources
					cnn.com: 90438
					sportsillustrated.cnn.com: 4337
					politicalticker.blogs.cnn.com: 3460
					www.cnn.co.jp: 3136
					amp.cnn.com: 1781
					us.cnn.com: 1731
					religion.blogs.cnn.com: 1282
					cnn.it: 1237
					news.blogs.cnn.com: 1108
					cnnphilippines.com: 884
					cnn.com: 627
					m.huffpost.com: 6245
					huffpost.com: 271
					huffpost.com: 140
					time.com: 27571
					newsfeed.time.com: 1728
					npr.org: 59984
					slate.com: 3280
					mobile.slate.com: 136
					newsweek.com: 18291
					europe.newsweek.com: 774
					usnews.com: 10329
					health.usnews.com: 638
					money.usnews.com: 307
					politico.com: 47515
					salon.com: 35655
					indymedia.org.uk: 95
					boston.indymedia.org: 77
					portland.indymedia.org: 60
					democraticunderground.com: 1129
					theatlantic.com: 36488
					villagevoice.com: 1272
					blogs.villagevoice.com: 737
					dailykos.com: 22892
					m.dailykos.com: 748
					dailykos.com: 383
					eschatonblog.com: 47
					m.newyorker.com: 149
					video.newyorker.com: 50
					newyorker.com: 49
					thedailybeast.com: 30124
					alternet.org: 35329
					alternet.org: 978
					commondreams.org: 18011
					commondreams.org: 298
					crooksandliars.com: 6452
				 
			 
			
				
					accessonline.com: 27
					buzzfeednews.com: 984
					etonline.com: 1017
					newyorker.com: 149
					hollywoodlife.com: 638
					variety.com: 17136
					billboard.com: 6943
					vanityfair.com: 6943
				
			 
			
				
					foxnews.com: 53999
					wallstreetjournal.de: 4
					nytimes.com: 235235
					news.yahoo.com: 57665
					washingtonpost.com: 208974 
					theguardian.com: 180157 
					bbc.co.uk: 158433
					cbc.ca: 111557 
					bbc.com: 83937 
					dailymail.co.uk: 79965
					thehill.com: 73980
					independent.co.uk: 73278 
					telegraph.co.uk: 70498
					liveleak.com: 34593
					wired.com: 31763
					vice.com: 16687
					motherboard.vice.com: 10977
					newyorker.com: 15866
					mobile.nytimes.com: 12079
					theregister.co.uk: 12112
					thenation.com: 10810
					mediaite.com: 10944
					baltimoresun.com: 10503
					vanityfair.com: 6897
				
			 
			
				
						sciencedaily.com: 17374
						newscientist.com: 11513
						space.newscientist.com: 356
						technology.newscientist.com: 212
						environment.newscientist.com: 159
						blogs.scientificamerican.com: 3961
						scientificamerican.com: 15
						sciencenews.org: 3010
						independentsciencenews.org: 44
						sciencenews.org: 27
						livescience.com: 8333
						m.livescience.com: 170
						amp.livescience.com: 38
						phys.org: 21314
						m.phys.org: 529
						advancedsciencenews.com: 5
						eurekalert.org: 8079
						sci-news.com: 1863
						sciencenewsforstudents.org: 51
						news.sciencemag.org: 3480
						sciencemag.org: 3255
						science.sciencemag.org: 778
						advances.sciencemag.org: 426
						sciencenow.sciencemag.org: 223
						popsci.com: 4886
						scitechdaily.com: 823
						undark.org: 199
						eos.org: 261
						scimex.org: 20
						scienceandtechnologyresearchnews.com: 9
						independentsciencenews.org: 4
						sciworthy.com: 19
						sciencenewsjournal.com: 272
				
			 
			
			
				
					boardgamegeek.com: 2218
					engadget.com: 27944
					kotaku.com: 11664
					kotaku.com.au: 856
					compete.kotaku.com: 202
					steamed.kotaku.com: 192
					cosplay.kotaku.com: 99
					m.kotaku.com: 70
					metacritic.com: 1033
					mobygames.com: 474
					pcgamer.com: 11351
					japanese.engadget.com: 1180
					escapistmagazine.com: 4089
					gamefaqs.gamespot.com: 84
				
			 
			
				
					money.cnn.com: 20756
					investopedia.com: 1029
					budgetsaresexy.com: 29
					finance.yahoo.com: 14231
					ca.finance.yahoo.com: 581
					uk.finance.yahoo.com: 388
					au.finance.yahoo.com: 130
					sg.finance.yahoo.com: 126
					in.finance.yahoo.com: 85
					thepennyhoarder.com: 60
					fool.com: 2024
					daveramsey.com: 36
					freakonomics.com: 323
					moneycrashers.com: 44
				
			 
			
				
					answers.yahoo.com: 2316
					bing.com: 1238
					google.com.au: 1448
					google.com.ar: 120
					images.duckduckgo.com: 671
					img-fotki.yandex.ru: 213
				
			 
			
			
			*This project provides a glimpse into the content of the training dataset for GPT-2.
															 All the presented calculations are made manually in order to avoid any automatic analysis.
															 Therefore it does not represent a full overview of the contents of the dataset, but is rather
															 intended to give an idea of what types of information can be included in the process of developing
															 text-generating tools.