I have a pretty interesting programming problem at work. I use Python, for the record

Question

I have a pretty interesting programming problem at work. I use Python, for the record

Owen Lee

I have a pretty interesting programming problem at work. I use Python, for the record.

So I have this external API for user related data from a vendor, and it has a crap ton of records separated by pages. I can query using REST API to separate by page # (ie ?page=1). I need to grab this user data and join it with my server (remote) MySQL instance.

My machine is kind of limited... I can only load about 10% of the database and 10% of the API results at a time. How can I compute this? What's the most efficient way?

What I've done so far is first fetch the number of rows in the SQL database, and spawn 10 threads for each block (roughly 10% of the rows in each thread). I have a lock which prevents more than one thread from running, so that only 10% of the rows are loaded. Then, that thread spawns a new thread to read each of the pages in the API and left joins it to my sql database. Is this approach bad?

Attached: 1515360588743.gif (354x287, 2M)

September 3, 2018 - 20:03

Carter Gomez

is this api call specific to each user?

September 3, 2018 - 20:10

Owen James

Nope, it's per page, multiple users come back per page (about 10,000).. it returns JSON records.

September 3, 2018 - 20:52

Benjamin Scott

Bump for advice please. I'd like to at least know if my approach is correct.

September 3, 2018 - 21:15

Grayson Edwards

Last bump, hopefully someone can help.

September 3, 2018 - 21:53

Josiah Stewart

what do you mean your computer resources are limited to only loading about 10% of the DB?

September 3, 2018 - 21:55

Charles Richardson

I only have about 512MB of RAM.

I can query the SQL database and bring in about 10% of the records into my machine (as a result set in python). Then I can join the data and look for matches, etc.

September 3, 2018 - 22:00

Andrew Gomez

dump what you have in ram to HDD, then load more up, dump to HDD, continue until you have the database. You are wanting to copy the database to work from it, right?

September 3, 2018 - 22:04

Grayson Morgan

So the API gives me the user activity (specifies if they are active on a third party vendor or not).

And my database has user information and whether they are active on our platform.

I am just looking for which records I need to update

September 3, 2018 - 22:10

Kevin Williams

Read the external API one page at a time, insert raw data into your database.

Once done, join the raw data with whatever data you already have using SQL.
As a bonus if you fuck up you always have the raw data to fall back to, without having to request it again.

September 3, 2018 - 22:11

Daniel Long

At least cache the api results. And get more ram holy shit, what is the point in a db if you can't query it?

September 3, 2018 - 22:15

Mason Phillips

I don't see why you can't use the HDD for this. Sure it might be slower, but it will get the job done.

September 3, 2018 - 22:16

Benjamin Reyes

So this is what I was thinking. Thanks man.

Can I cache on-disk? Is the file i/o too expensive?

September 3, 2018 - 22:17

Easton Carter

If there isn't any time constraint, like a real-time time constraint, then what matters is solving the problem, not how it's done.

September 3, 2018 - 22:20

Adrian King

They are asking me to solve it in the most efficient manner possible.

September 3, 2018 - 22:21

Jaxon Gray

>most efficient manner possible
Why are you using python?

September 3, 2018 - 22:24

Jordan Rivera

That's what our company uses most

September 3, 2018 - 22:51

Adam Carter

>is the file i/o too expensive
probably less than the network latency

September 3, 2018 - 23:28

Christopher Evans

>I am just looking for which records I need to update
Why do you need lots of ram then? Cycle through each one and only keep track of the ones you want.

September 4, 2018 - 00:27

Jason Ross

Bump

September 4, 2018 - 02:28

Leo Smith

Are they sorted alike? If not is your only option. Anyway get more RAM you kike

September 4, 2018 - 04:09

1 2 3 Next

I have a pretty interesting programming problem at work. I use Python, for the record

Last threads