2024年3月12日发(作者:游戏app平台排行榜)
EfficientLearningofSparseRepresentations
withanEnergy-BasedModel
Marc’AurelioRanzatoChristopherPoultneySumitChopraYannLeCun
CourantInstituteofMathematicalSciences
NewYorkUniversity,NewYork,NY10003
{ranzato,crispy,sumit,yann}@
Abstract
Wedescribeanovelunsupervisedmethodforlearningsparse,overcompletefea-
elusesalinearencoder,andalineardecoderprecededbyaspar-
sifyingnon-linearitythatturnsacodevectorintoaquasi-binarysparsecodevec-
ninput,theoptimalcodeminimizesthedistancebetweentheoutput
ofthedecoderandtheinputpatchwhilebeingassimilaraspossibletotheen-
ngproceedsinatwo-phaseEM-likefashion:(1)compute
theminimum-energycodevector,(2)adjusttheparametersoftheencoderandde-
elproduces“strokedetectors”when
trainedonhandwrittennumerals,andGabor-likefilterswhentrainedonnatural
nceandlearningareveryfast,requiringnopreprocessing,
heproposedunsupervisedmethodtoinitialize
thefirstlayerofaconvolutionalnetwork,weachievedanerrorrateslightlylower
y,anextensionofthe
methodisdescribedtolearntopographicalfiltermaps.
1Introduction
Unsupervisedlearningmethodsareoftenusedtoproducepre-processorsandfeatureextractorsfor
rmethodssuchasWaveletdecomposition,PCA,Kernel-PCA,Non-
NegativeMatrixFactorization[1],andICAproducecompactrepresentationswithsomewhatuncor-
related(orindependent)components[2].Mostmethodsproducerepresentationsthateitherpreserve
r,severalrecentworkshaveadvocatedtheuse
ofsparse-overcompleterepresentationsforimages,inwhichthedimensionofthefeaturevectoris
largerthanthedimensionoftheinput,butonlyasmallnumberofcomponentsarenon-zerofor
anyoneimage[3,4].Sparse-overcompleterepresentationspresentseveralpotentialadvantages.
Usinghigh-dimensionalrepresentationsincreasesthelikelihoodthatimagecategorieswillbeeasily
(possiblylinearly)representationscanprovideasimpleinterpretationoftheinput
dataintermsofasmallnumberof“parts”r-
more,thereisconsiderableevidencethatbiologicalvisionusessparserepresentationsinearlyvisual
areas[5,6].
Itseemsreasonabletoconsiderarepresentation“complete”ifitispossibletoreconstructtheinput
fromit,becausetheinformationcontainedintheinputwouldneedtobepreservedintherepresen-
supervisedlearningmethodsforfeatureextractionarebasedonthisprinciple,
andoder
takestheinputandcomputesacodevector,forexampleasparseandovercompleterepresentation.
Thedecodertakesthecodevectorgivenbytheencoderandproducesareconstructionofthein-
randdecoderaretrainedinsuchawaythatreconstructionsprovidedbythedecoder
areassimilaraspossibletotheactualinputdata,whentheseinputdatahavethesamestatistics
ssuchasVectorQuantization,PCA,auto-encoders[7],Restricted
BoltzmannMachines[8],andothers[9]haveexactlythisarchitecturebutwithdifferentconstraints
onthecodeandlearningalgorithms,
otherapproaches,theencodingmoduleismissingbutitsroleistakenbyaminimizationincode
spacewhichretrievestherepresentation[3].Likewise,innon-causalmodelsthedecodingmodule
ismissingandsamplingtechniquesmustbeusedtoreconstructtheinputfromacode[4].Insec.2,
wederaining,
theencoderallowsveryfastinferencebecausefindingarepresentationdoesnotrequiresolvingan
oderprovidesaneasywaytoreconstructinputvectors,thusallowing
thetrainertoassessdirectlywhethertherepresentationextractsmostoftheinformationfromthe
input.
Mostmethodsfindrep
ordertolearnsparserepresentations,rmusually
penalizesthosecodeunitsthatareactive,aimingtomakethedistributionoftheiractivitieshighly
peakedatzerowithheavytails[10][4].Adrawbackfortheseapproachesisthatsomeaction
mightneedtobetakeninordertopreventthesystemfromalwaysactivatingthesamefewunitsand
collapsingalltheotherstozero[3].
anon-linearity,inthesystem[11].Thisingeneralforcesalltheunitstohavethesamedegreeof
sparsity,paper,we
presentasystemwhichachievessparsitybyplacinganon-linearitybetweenencoderanddecoder.
Sec.2.1describesthismodule,dubbedthe“SparsifyingLogistic”,whichisalogisticfunctionwith
n-linearityisparameterizedinasimple
waywhichallowsustocontrolthedegreeofsparsityoftherepresentationaswellastheentropyof
eachcodeunit.
Unfortunately,learningtheparametersinencoderanddecodercannotbeachievedbysimpleback-
propagationofthegradientsofthereconstructionerror:theSparsifyingLogisticishighlynon-linear
ore,insec.3wepropose
toaugmentthelossfunctionbyconsideringnotonlytheparametersofthesystembutalsothe
tingthefactthat1)itis
fairlyeasytodeterminetheweightsinencoderanddecoderwhen“good”codesaregiven,and2)
itisstraightforwardtocomputetheoptimalcodeswhentheparametersinencoderanddecoderare
fixed,wedescribeasimpleiterativecoordinatedescentoptimizationtolearntheparametersofthe
cedurecanbeseenasasortofdeterministicversionoftheEMalgorithminwhich
rningalgorithmdescribedturnsouttobe
particularlysimple,-processingisrequiredfortheinputimages,beyonda
.4wereportexperimentsoffeatureextractionon
esystemhasalinearencoderanddecoder
(rememberthattheSparsifyingLogisticisaseparatemodule),thefiltersresemble“objectparts”for
thenumerals,andlocalized,ngthesefeatures
fortheclassificationofthedigitsintheMNISTdataset,wehaveachievedbyasmallmarginthe
ludebyshowingahierarchicalextensionwhich
suggeststheformofsimpleandcomplexcellreceptivefields,andleadstoatopographiclayoutof
thefilterswhichisreminiscentofthetopographicmapsfoundinareaV1ofthevisualcortex.
2TheModel
Theproposedmodelisbasedonthreemaincomponents,asshowninfig.1:
•Theencoder:Asetoffeed-forwardfiltersparameterizedbytherowsofmatrixW
C
,that
computesacodevectorfromanimagepatchX.
•TheSparsifyingLogistic:Anon-linearmodulethattransformsthecodevectorZintoa
¯
withcomponentsintherange[0,1].sparsecodevectorZ
•Thedecoder:AsetofreversefiltersparameterizedbythecolumnsofmatrixW
D
,that
¯
.computesareconstructionoftheinputimagepatchfromthesparsecodevectorZ
Theenergyofthesystemisthesumoftwoterms:
E(X,Z,W
C
,W
D
)=E
C
(X,Z,W
C
)+E
D
(X,Z,W
D
)(1)
Thefirsttermisthecodepredictionenergywhichmeasuresthediscrepancybetweentheoutputof
xperiments,itisdefinedas:
1
1
||Z−Enc(X,W
C
)||
2
=||Z−W
C
X||
2
(2)
22
Thesecondtermisthereconstructionenergywhichmeasuresthediscrepancybetweentherecon-
xperiments,it
E
C
(X,Z,W
C
)=
发布者:admin,转转请注明出处:http://www.yc00.com/xitong/1710249960a1726619.html
评论列表(0条)