Database advice

Five Cent Head

64th note
Gold Supporter
SoSH Member
Jul 17, 2007
764
Seattle
I am looking for database advice. I am generally comfortable with computers but a complete novice with databases.

The setup:
  • We have a collection of data currently maintained as several spreadsheets, and data entry often requires entering the same information into several different spreadsheets. This is stupid and annoying and leads to mistakes. We also want to extract data in ways that it not currently easy to do.
  • Two of us control the data. My colleague does at least 75% of the data entry and I do the rest. The two of us each use the data, probably in different ways. No one else has access, but I could imagine extracting pieces of the data for wider distribution, if it was easy to do.
  • The data has this sort of flavor: we have people, products, and dates. One spreadsheet (for a given year, say) will have a row for each person (or maybe several rows for each person), a column for each month, and each entry will show a product connected to that person in that month. Another spreadsheet might correspond to a single month and have rows for each product, but with extra data (not present in the first spreadsheet) attached to the entries. We might also want to look at a given person and see what they've done over the past 5 years, or we might want to look at a given product and see who's been involved with it.
  • So it seems like converting the information to a database is the obvious choice. (Right?) Then we want ways of extracting data to produce the two current spreadsheets, plus get other information, and I'm hoping that's easy to do with standard database software.
  • I am comfortable extracting data from the current spreadsheets (dump to a csv file, write some Python code to manipulate it, then write a new csv file). And it's sort of a mess currently, so I'll probably have to do that before importing it into a new system.
I am looking for recommendations for database software to use. I'm a complete novice, as I said, but I guess we need a server to store the data centrally for the two users, and then we need a way to enter and extract data. I will be in charge of putting everything together, but I need it to be easy for my colleague to use; I also may not be in the same position in a few years, so I want something that is not hard for my successor to maintain. Finally, I don't want to spend huge amounts of money, either; if it costs too much, we will probably just stick with what we have.

In case it matters, I used a Mac, not Windows, while my main collaborator on this is somewhat platform agnostic. I think I can get our IT people to run a server on a shared machine, so that we can both access the data. I've seen websites (Macintouch, mainly) recommend different database software:
Something that works on multiple platforms is preferred, especially when planning for the future. We only need access for two of us right now, and while that number could grow, it will never be very large. Cheaper is preferred, too.

What advice do you have?
 

pokey_reese

Member
SoSH Member
Jun 25, 2008
16,247
Boston, MA
A few important questions:

Is this just for storing spreadsheet contents?
Do you think that there is a chance that services will want to interact directly with your data store at some point, and cut out the need for a person to stuff a CSV in there?
Do you need this data to be on prem, or is a cloud solution viable (and have you thought about security/GDPR/CA privacy law implications)?
 

Five Cent Head

64th note
Gold Supporter
SoSH Member
Jul 17, 2007
764
Seattle
A few important questions:

Is this just for storing spreadsheet contents?
There are also connections among the data points and some computations that need to be made based on the data. For example, a given product might have a number of points associated to it, and when a person is involved with that product, they get that many points added to their running total. We need to keep track of these totals, and we also have a separate spreadsheet showing the number of points attached to each product (which can vary year to year).

Do you think that there is a chance that services will want to interact directly with your data store at some point, and cut out the need for a person to stuff a CSV in there?
It is possible that we might want to pull some data from other databases, and it is possible that we would want to provide limited access to our data from the people involved (if they wanted to see their current running total, for instance).

Do you need this data to be on prem, or is a cloud solution viable (and have you thought about security/GDPR/CA privacy law implications)?
I think a cloud solution should be viable. Much of the information is essentially public anyway. We also have access to cloud storage which satisfies the organization's privacy constraints, if a product allows us to specify where things are stored.
 

cgori

Member
SoSH Member
Oct 2, 2004
3,999
SF, CA
This is pretty much what you use a database for. But the DB software usually requires you to write some code on top of it to create your schema and workflow. Your schema is basically the layout of the different spreadsheets you described (by product, by month, by salesperson, whatever). In a relational database the fields (columns, typically in a spreadsheet) that connect sheets together are usually called keys. Your workflow is basically the process of entering/validating the data, and generating the reports you need.

This sounds like sales cycle management data to me (or maybe sales comp analysis?), so there may be some canned product out there that does it already on top of another database layer. You might be able to use a CRM (HubSpot has a free tier?) to get most of this. Or use something else with HubSpot to create the comp/promo points - I found something called Sales Cookie that integrated with HubSpot @ $19.99/user/month, but I've never used that before. At my old job we used the classic Salesforce stuff, and it can do everything you are talking about (and about a million other things) but it's not cheap, can easily be $80/user/month or more. Even with Salesforce you end up having to customize the applications for the data you collect and your business rules (and thus the Salesforce Consultant enters the picture...)

Basically you have to decide if you want to own/develop this application yourself or just pay someone to do that part. But what you are describing is definitely a natural for a database, the spreadsheets will not scale above a certain size.
 

nvalvo

Member
SoSH Member
Jul 16, 2005
21,478
Rogers Park
Depending on the complexity and size of your data and what you need to do with it, I might recommend you consider Notion.so.

I've been using it to coordinate some freelance projects. It's designed for small teams to customize workflows together and share information, which... sounds like it's what you want to do. It's very cheap, quite easy to work with, cross-platform, multi-user, and has some deceptively robust capacity for multiple views of the same data, formulas, database relations, and the like.

It might be too lightweight, but it might not.
 

Five Cent Head

64th note
Gold Supporter
SoSH Member
Jul 17, 2007
764
Seattle
This is pretty much what you use a database for. But the DB software usually requires you to write some code on top of it to create your schema and workflow. Your schema is basically the layout of the different spreadsheets you described (by product, by month, by salesperson, whatever). In a relational database the fields (columns, typically in a spreadsheet) that connect sheets together are usually called keys. Your workflow is basically the process of entering/validating the data, and generating the reports you need.

This sounds like sales cycle management data to me (or maybe sales comp analysis?), so there may be some canned product out there that does it already on top of another database layer. You might be able to use a CRM (HubSpot has a free tier?) to get most of this. Or use something else with HubSpot to create the comp/promo points - I found something called Sales Cookie that integrated with HubSpot @ $19.99/user/month, but I've never used that before. At my old job we used the classic Salesforce stuff, and it can do everything you are talking about (and about a million other things) but it's not cheap, can easily be $80/user/month or more. Even with Salesforce you end up having to customize the applications for the data you collect and your business rules (and thus the Salesforce Consultant enters the picture...)

Basically you have to decide if you want to own/develop this application yourself or just pay someone to do that part. But what you are describing is definitely a natural for a database, the spreadsheets will not scale above a certain size.
Thanks for the summary in the first paragraph and the suggestions in the second paragraph. For the third paragraph, I don't think it's in our budget to pay someone, I will have to do it myself if I can't find a canned solution.
 

Five Cent Head

64th note
Gold Supporter
SoSH Member
Jul 17, 2007
764
Seattle
Depending on the complexity and size of your data and what you need to do with it, I might recommend you consider Notion.so.

I've been using it to coordinate some freelance projects. It's designed for small teams to customize workflows together and share information, which... sounds like it's what you want to do. It's very cheap, quite easy to work with, cross-platform, multi-user, and has some deceptively robust capacity for multiple views of the same data, formulas, database relations, and the like.

It might be too lightweight, but it might not.
I will look into it, thanks.