Alright, so this question seems simple up-front. Well, there's a bit more to it than just loading the file.
The entire story goes like this:For some reason, our client sends us what can only be described as a relational database, flattened, condensed into a single csv file (but the delimiter is a tilde instead of a comma). It's kinda terrible, actually; the same data gets repeated ad infinitum throughout the entire file. So, in order to bring some order back into this data, I load it into an actual relational database. Because of the volume of data, loading it into a database makes it easier to inspect the data for issues. And it also makes it much easier to export.
There are 53 rows per record and somewhere in the ballpark of 250,000 records per transmission. I want to split that into 6 normalized tables. I'm not sure whether to validate the data in the C# program or the SQL Server 2016 LocalDb instance I'm using.
I'm not an experienced DBA; I'm a C# programmer who has dabbled in SQL a little. I feel comfortable enough with the syntax, but I want to make sure I'm doing this right.
Also, everything has to be completely automated. The file comes in, the C# program starts when it receives a file, and loads it into the database.
Let me explain the layout more. The file is 53 fields, and each line contains a detail line for a statement (what they are being charged for, how frequently they were charged for the item, total cost or credit for that item, etc). The problem is that EVERY line has the information for an entire mailing, the payer, resident, property, and remittance. With that being known, let me explain how I'm doing this now:
- Open the file
- For each line of the file, retrieve the keys for the tables that describe the mailing, payer, resident, property and remittance destinations.
- Compare that data with the cached ones. If the cached data is invalid, query the DB to see if that entity has already been added. If not, create it. Cache that.
- Add the new detail line and relate it to the mailing, which has a one to many relationship with details. (The mailing itself has a many to one relation to the payer, property, and remittance.)
- Close the file when finished.
It's not the slowest thing in the world, but it's all being done completely in RAM. Since the program comes dangerously close to running out of RAM in its current state, we've decided to load the data into a database instead of keeping it all in local RAM. Hopefully that sheds some more light on this for potential answerers. Thank you!