Is network design for Neural Networks as trial-and-errory as it seems...

Question

Is network design for Neural Networks as trial-and-errory as it seems...

Blake Lee

Is network design for Neural Networks as trial-and-errory as it seems? Are there any tips/tricks I should know to do this better?

It feels like I'm just throwing shit at a wall until something works.

Attached: 1512976334066.jpg (800x1133, 254K)

October 15, 2018 - 23:32

Blake Ward

Cute picture of me

October 15, 2018 - 23:34

Gavin Morales

No, you can read all the papers you want but all they'll tell you is "we tried out a bunch of shit and here's the one we found had the highest cross validated accuracy."

October 16, 2018 - 00:09

Robert Jones

>as trial-and-errory
The word you're looking for is "tinkering", or "experimental"

this
neural network architecture is just mixing and matching aspects of other architectures and seeing how well it works.
Every once in a while you get a genius like Ian Goodfellow crafting the Columbus Egg and the field explodes all over again.

October 16, 2018 - 00:14

Christian Collins

Aren't there at least techniques to make the trial-and-error part not brute force? The hypest, most promising application of CS wouldn't happen to be trivial, would it?

October 16, 2018 - 01:00

Brayden Barnes

>not brute force
It isn't brute force and it never was brute force.
Stochastic methods aren't brute force, and even research into [redacted] yeah nah nevermind

October 16, 2018 - 01:06

Xavier Nguyen

What a cute pic

October 16, 2018 - 01:12

Lincoln Parker

>Aren't there at least techniques to make the trial-and-error part not brute force?
Understanding what your data is like, rhe kind of relationships you are looking for and where.

October 16, 2018 - 01:39

Robert Anderson

Explain...

October 16, 2018 - 02:30

Daniel Davis

What IDE or language do you guys use for neural nets?

October 16, 2018 - 02:35

Matthew Sanchez

Not that guy, but gradient descent is a pretty efficient way of optimizing a function with millions of parameters. Brute force would be more like a million nested for loops, which would basically never terminate

Jupyter for feature engineering, pycharm for production code

I've been running a bunch of iterated tests on MNIST to see if I can draw up some equations relating network size, learning rate, other hyerparameters vs eventual performance.
The ideal would be to be able to describe how a network will train given its architecture. I've got a feeling that the form of the equations will probably be about the same across datasets, and then we might be able to assign some parameters to a dataset that describe what it will take to model it / what the optimal architecture is.
I've got a batch of 2000 training runs with learning rate / middle layer size / activation function that I need to write visualizations for...
Does anyone here have links to any papers in this area? I'm a bit surprised that I haven't seen any yet.

>Top is information gain (bits) vs network size (784-X-10), bottom is 1/time to converge vs same parameter

October 16, 2018 - 02:47

Ethan Stewart

Missing image

Attached: 1532866090896.png (823x851, 50K)

October 16, 2018 - 02:48

Benjamin Walker

My husbando

October 16, 2018 - 02:49

James Roberts

Right now I'm using Keras and Vim.

October 16, 2018 - 02:49

Luis Russell

>Is network design for Neural Networks as trial-and-errory as it seems
Work on it for a while and you'll develop insights (that mostly can't be put down into words) on how to choose hyperparameter/design

October 16, 2018 - 02:51

Jeremiah Allen

Decided to write the first of those visualizations

>you'll develop insights
Yeah, but insightful artisan hyperparameters don't scale

MNIST dataset, [784-10-10] neurons, 3000 steps, batch size 32, linear activation.
Geometric learning rates:
> 0.000122, 0.000244, 0.000488, 0.000977, 0.001953, 0.003906, 0.007812, 0.015625, 0.03125, 0.0625, 0.125, 0.25, 0.5
(Red is larger)
Test set info gain vs training steps

I ran it with smaller learning rates, but they didn't train. It's only with the linear activation that you get these nice staggered curves -- sigmoid and relu are a lot less noisy, but they go all over the place. Seems like I should explore the area between 0.25 and 1 a bit more, which is honestly higher than I was expecting.

Next test is probably variance over many training runs on the same architecture.

Attached: 1532434184030.png (757x752, 86K)

October 16, 2018 - 03:52

Thomas Walker

>Yeah, but insightful artisan hyperparameters don't scale
You won't develop those insights experimenting with toy dataset like MNIST.

October 16, 2018 - 04:10

Nicholas Foster

What should I use?

October 16, 2018 - 04:56

Jordan Nguyen

CIFAR10 is a good start after MNIST, and you'll need a semi-decent GPU

October 16, 2018 - 05:02

Xavier Williams

To add, peopls have found that network architechture designs that are good on CIFAR-10 (~100MB dataset) are almost automatically good on Imagenet (~100GB dataset), so the insights definitely do scale

October 16, 2018 - 05:05

Charles Wood

1060 is fine?

October 16, 2018 - 06:13

Liam Powell

It's surprisingly hard to find useful rules of thumb for this stuff, but here's what I found in my limited experience. A lot of it is pretty obvious in hindsight.

>the deeper your network the more higher-order terms you get. So a shallow network will tend to produce piecewise linear or quadratic, go deeper for weirder functions
>the size of a layer limits the information that can go through it. So 128 -> 2 -> 64 is a waste
>going deeper than like 6 layers PROBABLY won't help unless your function is really weird
>make your model bigger. The more weights the less random your performance will be, because it seems less likely for you to get caught in shitty local minima.
>If you overfit too much consider just slapping in regularization or dropout instead of shrinking the model

YMMV I'm hardly an expert at this shit.

October 16, 2018 - 06:28

Zachary Ramirez

If it's 3GB version: consider getting a better one.
If it's 6GB version: barely ok.

October 16, 2018 - 06:32

1 2 3 Next

Is network design for Neural Networks as trial-and-errory as it seems...

Last threads