‘...automated journalism is expected to substantially increase the amount of available news. [...]
To cope with the resulting information overload, the importance of search engines and personalized news aggregators, such as
Google News, are likely to increase further. Search engine providers claim to analyze individual user data (e.g., location and historical
search behavior). [As a result] different news consumers might receive different results for the same keyword searches.
[...] personalization will lead individuals to consume more and more of the same information, as algorithms provide
only content that users like to read or agree with. Consequently, people would be less likely to encounter information
that challenges their views or contradicts their interests, which could carry risks for the formation of public opinion
in a democratic society.
'Guide to Automatic Journalism'—Andreas Graefe, 2016
Text-generating tool GPT-2 is used in various services for Automated Journalism. The user of such a service is only experiencing the end result—a piece of text. In the same way as other types of technologies, where the process is unknown or unavailable for the user, it is frequently perceived as impartial and objective (since it is made by a 'machine'.) This project is an attempt to take a closer look into the very much human-created contents of the dataset, used to train GPT-2, and to expose their subjective character.


Memes
times scraped
Fake News
times scraped
Conservative Media
times scraped
Liberal Media
times scraped
Entertainment Media
times scraped
Other Media
times scraped
Science
times scraped
Shopping
times scraped
Gaming
times scraped
Finance
times scraped
Search Engines
times scraped
Fiction
times scraped
qkme.me: 433189
quickmeme.com: 257995
livememe.com: 77424
memegenerator.net: 19971
makeameme.org: 17779
m.quickmeme.com: 9607
memedad.com: 5903
memecrunch.com: 4785
knowyourmeme.com: 1318
memegen.com: 1261
quickmeme.com: 1140
mememaker.net: 848
weknowmemes.com: 772
memegenerator.co: 701
memecreator.org: 594
memecenter.com: 503
memefive.com: 499
whatsmeme.com: 373
zipmeme.com: 311
memeslide.com: 307
memegur.com: 300
slowmeme.com: 209
memes.com: 185
us.memebox.com: 180
ismeme.com: 176
memegene.net: 165
memegeneokerlund.com: 163
m.memegenerator.co: 136
memeburn.com: 133
dubaimemes.com: 130
meme.li: 127
memeorandum.com: 121
memepile.com: 117
memegrator.com: 116
memecaptain.com: 108
es.memegenerator.net: 98
images.memes.com: 97
ru.memegenerator.net: 94
atheistmemebase.com: 86
cartoonmeme.com: 86
img.memecdn.com: 80
memebox.com: 70
memebase.com: 70
senorgif.memebase.com: 68
memeful.com: 60
funmeme.com: 60
scumbag.memegenerator.net: 55
whatireally.memegenerator.net: 54
techmeme.com: 54
macromeme.com: 52
memerush.com: 50
memeblender.com: 50
memerial.net: 48
homemediamagazine.com: 47
logicalmeme.com: 45
homemediatech.net: 44
memez.com: 44
memesandcomics.com: 43
memeshuffle.com: 42
memecreator.net: 42
dankmeme.net: 41
memebox.com: 40
memecreator.com: 38
linuxmemes.com: 37
images.memegenerator.net: 36
m.memegenerator.net: 36
motivatingmemes.net: 35
departmentofmemes.com: 33
ratemymeme.com: 32
memebase.cheezburger.com: 30
memepile.com: 30
verydemotivational.memebase.com: 29
whatdoumeme.com: 29
graphjam.memebase.com: 28
nememe.com: 27
memecloud.co: 26
pokememes.memebase.com: 26
bronies.memebase.com: 25
quick-meme.tumblr.com: 25
memetrashblog.wordpress.com: 24
hotmeme.net: 23
memelite.com: 23
refriedmemes.com: 22
memegene.org: 22
exmomemes.blogspot.com: 21
meme-rage.tumblr.com: 21
preview.images.memegenerator.net: 20
memedaily.com: 20
memeinnovation.com: 20
popularmeme.com: 19
memecreator.eu: 19
memeshuffle.com: 19
meme5.net: 18
meme-city.tumblr.com: 18
memeboat.com: 17
memeguy.com: 17
livememe.com: 17
memepix.com: 17
hsmemes.com: 17
global3.memecdn.com: 16
minimemes.net: 16
memebuster.net: 16
thoughtcrimemedia.squarespace.com: 16
memegene.net: 16
comixed.memebase.com: 16
memelatte.com: 16
meme-depot.tumblr.com: 16
memesly.com: 16
russ.unwashedmeme.com: 16
memeheroes.com: 15
wbb-memelord.tumblr.com: 15
mememachine.viralvideochart.com: 15
starwarsmemes.com: 15
easymemes.com: 15
memeWOW.com: 15
mememerchant.blogspot.ca: 15
memedroid.com: 14
quickmeme.com: 14
memeinsider.co: 14
memecreator.net: 14
memedroid.com: 14
mementoexclusives.com: 14
memewhore.tumblr.com: 14
memeblock.com: 14
mozillamemes.tumblr.com: 14
catholicmemes.com: 14
suchmemes.com: 14
ryepup.unwashedmeme.com: 13
su-memelord.tumblr.com: 13
leaguememe.com: 13
memeslap.com: 13
genomemedicine.biomedcentral.com: 13
rs1img.memecdn.com: 13
catholicmemes.com: 12
meme4u.tumblr.com: 12
thinkchristian.reframemedia.com: 12
meme.zenfs.com: 12
memeviewer.com: 12
memehunter.tumblr.com: 12
stevensdankmemes.tumblr.com: 12
slowmeme.com: 12
artoftrolling.memebase.com: 12
atcmemes.com: 12
wantmeme.com: 11
fantasticmemes.wordpress.com: 11
mlbmemes.lockerdome.com: 11
memedokis.tumblr.com: 11
rocketmemes.com: 11
memesarefun.tumblr.com: 11
memealert.org: 11
memedroid.novagecko.com: 11
memefo.tumblr.com: 11
meme.is-great.net: 11
*according to factcheck.org
Fake news, deceptive content, misinformation and conspiracy theories, fiction, registered through a company that hides the identity and location of the owner of the website.
americasfreedomfighters.com 393
americanjournalreview.com: 20
rightwingtribune.com: 28
americatalks.com: 6
bannedinformation.com: 4
cbinfo24.com: 2
channel23news.com: 6
conservativeangle.com: 29
consnation.com: 4
theconservativepost.com: 1
theconservativetreehouse.com: 1
daily-vine.com: 4
empirenews.net: 46
en-volve.com: 74
fbnewscycle.com: 60
fellowshipoftheminds.com: 1
gellerreport.com: 525
infowars.com: 146
kagfeed.com: 4
londonwebnews.com: 4
mminfo24.com: 2
neonnettle.com: 277
newspunch.com: 40
nyeveningnews.com: 22
the-postillon.com: 74
puppetstringnews.com: 37
realnewsrightnow.com: 9
rightwingtribune.com: 28
rwnofficial.com: 84
therightists.com: 3
stgeorgegazette.com: 7
stuppid.com: 26
teddystick.com: 22
topalertnews.com: 2
truthfeednews.com: 106
www.universaleinfo.com: 5
ussanews.com: 45
viralcords.com: 1
yournewswire.com: 1332
newspunch.com: 40
theonion.com: 1684
entertainment.theonion.com: 225
politics.theonion.com: 692
sports.theonion.com: 270
*according to this website
Conservative, far-right, republican media sources
foxnews.com: 53999
wallstreetjournal.de: 4
drudgereport.com: 64
newsmax.com: 5505
freerepublic.com: 1013
washingtontimes.com: 467
m.townhall.com: 446
townhall.com: 81
rushlimbaugh.com: 1388
realclearpolitics.com: 46
nationalreview.com: 9694
hotair.com: 7224
michellemalkin.com: 368
glennbeck.com: 228
humanevents.com: 701
heritage.org: 402
newsbusters.org: 4164
lewrockwell.com: 6514
weeklystandard.com: 2935
newswithviews.com: 298
pajamasmedia.com: 312
americanthinker.com: 7389
cnsnews.com: 1535
reason.com: 1182
lucianne.com: 20
anncoulter.com: 256
cato.org: 1978
dailypaul.com: 2728
volokh.com: 461
billoreilly.com: 99
redstate.com: 5609
conservapedia.com: 341
powerlineblog.com: 1744
frontpagemag.com: 3369
littlegreenfootballs.com: 461
campaignforliberty.org: 114
spectator.org: 2441
gopusa.com: 156
lileks.com: 34
rightwingnews.com: 486
wizbangblog.com: 51
moonbattery.com: 357
*according to this website
Liberal, left-wing, democratic media sources
cnn.com: 90438
sportsillustrated.cnn.com: 4337
politicalticker.blogs.cnn.com: 3460
www.cnn.co.jp: 3136
amp.cnn.com: 1781
us.cnn.com: 1731
religion.blogs.cnn.com: 1282
cnn.it: 1237
news.blogs.cnn.com: 1108
cnnphilippines.com: 884
cnn.com: 627
m.huffpost.com: 6245
huffpost.com: 271
huffpost.com: 140
time.com: 27571
newsfeed.time.com: 1728
npr.org: 59984
slate.com: 3280
mobile.slate.com: 136
newsweek.com: 18291
europe.newsweek.com: 774
usnews.com: 10329
health.usnews.com: 638
money.usnews.com: 307
politico.com: 47515
salon.com: 35655
indymedia.org.uk: 95
boston.indymedia.org: 77
portland.indymedia.org: 60
democraticunderground.com: 1129
theatlantic.com: 36488
villagevoice.com: 1272
blogs.villagevoice.com: 737
dailykos.com: 22892
m.dailykos.com: 748
dailykos.com: 383
eschatonblog.com: 47
m.newyorker.com: 149
video.newyorker.com: 50
newyorker.com: 49
thedailybeast.com: 30124
alternet.org: 35329
alternet.org: 978
commondreams.org: 18011
commondreams.org: 298
crooksandliars.com: 6452
accessonline.com: 27
buzzfeednews.com: 984
etonline.com: 1017
newyorker.com: 149
hollywoodlife.com: 638
variety.com: 17136
billboard.com: 6943
vanityfair.com: 6943
foxnews.com: 53999
wallstreetjournal.de: 4
nytimes.com: 235235
news.yahoo.com: 57665
washingtonpost.com: 208974
theguardian.com: 180157
bbc.co.uk: 158433
cbc.ca: 111557
bbc.com: 83937
dailymail.co.uk: 79965
thehill.com: 73980
independent.co.uk: 73278
telegraph.co.uk: 70498
liveleak.com: 34593
wired.com: 31763
vice.com: 16687
motherboard.vice.com: 10977
newyorker.com: 15866
mobile.nytimes.com: 12079
theregister.co.uk: 12112
thenation.com: 10810
mediaite.com: 10944
baltimoresun.com: 10503
vanityfair.com: 6897
sciencedaily.com: 17374
newscientist.com: 11513
space.newscientist.com: 356
technology.newscientist.com: 212
environment.newscientist.com: 159
blogs.scientificamerican.com: 3961
scientificamerican.com: 15
sciencenews.org: 3010
independentsciencenews.org: 44
sciencenews.org: 27
livescience.com: 8333
m.livescience.com: 170
amp.livescience.com: 38
phys.org: 21314
m.phys.org: 529
advancedsciencenews.com: 5
eurekalert.org: 8079
sci-news.com: 1863
sciencenewsforstudents.org: 51
news.sciencemag.org: 3480
sciencemag.org: 3255
science.sciencemag.org: 778
advances.sciencemag.org: 426
sciencenow.sciencemag.org: 223
popsci.com: 4886
scitechdaily.com: 823
undark.org: 199
eos.org: 261
scimex.org: 20
scienceandtechnologyresearchnews.com: 9
independentsciencenews.org: 4
sciworthy.com: 19
sciencenewsjournal.com: 272
boardgamegeek.com: 2218
engadget.com: 27944
kotaku.com: 11664
kotaku.com.au: 856
compete.kotaku.com: 202
steamed.kotaku.com: 192
cosplay.kotaku.com: 99
m.kotaku.com: 70
metacritic.com: 1033
mobygames.com: 474
pcgamer.com: 11351
japanese.engadget.com: 1180
escapistmagazine.com: 4089
gamefaqs.gamespot.com: 84
money.cnn.com: 20756
investopedia.com: 1029
budgetsaresexy.com: 29
finance.yahoo.com: 14231
ca.finance.yahoo.com: 581
uk.finance.yahoo.com: 388
au.finance.yahoo.com: 130
sg.finance.yahoo.com: 126
in.finance.yahoo.com: 85
thepennyhoarder.com: 60
fool.com: 2024
daveramsey.com: 36
freakonomics.com: 323
moneycrashers.com: 44
answers.yahoo.com: 2316
bing.com: 1238
google.com.au: 1448
google.com.ar: 120
images.duckduckgo.com: 671
img-fotki.yandex.ru: 213
*This project provides a glimpse into the content of the training dataset for GPT-2.
All the presented calculations are made manually in order to avoid any automatic analysis.
Therefore it does not represent a full overview of the contents of the dataset, but is rather
intended to give an idea of what types of information can be included in the process of developing
text-generating tools.