eXtropia: the open web technology company
Technology | Support | Tutorials | Development | About Us | Users | Contact Us
Development resources
 ::   WebWare 2.1 (Perl)
 ::   WebWare 2.0 (Java)
 ::   Cool hacks
cool hack
Speeding up the WebStore Search  
sent us the following cool hack ...

Below you will find a hack for using split multiple data.files in case the flatfile database is getting very slow with e.g. over 6000 items. It speeds up drastically. I get results on database search in 5-10 seconds where it took over a minute when I used just one large file. Hope this can be usefull to others... Hack for WEB_STORE with large flatfile database to speed up the search drastically! created by: Ellio Martina. Eefde - the Netherlands ellio@elliomartina.com january 2001 ######################################################### This is for those WEBSTORE FANATICS who want to keep things simple and straightforward to maintain, but using a very large flatfile database and where speed is a problem. If you want to handle a sophisticated, multi lingual shop, you better check the City-Shop hack, which solves almost all problems. The basic principle is to split a large flatfile into small ones. (It speeded up a web_store with a >6000 item flatfile in my case) If you do books or CD's for example, you could e.g. split it up alfabetically on writers or artist names, using the first letter(s). Let's say you have over 6000 items and you want to split into about 500 items per file, matching abc, def, ghij, etc. Now if you link to an item, a product_id or a category, depending on how you intend to split up the large flatfile database, only the abc_data.file will be searched if you add: ..&data_file=abc_... into the <A HREF="web_store.cgi?category=a&cart_id=" ... string. The hack will generate the database name on the fly into the following structure xxx_data.file where the prefix xxx_ matches your file selection criteria. In case you do a keyword search on fields like Title and Description, all xxx_data.files have to be searched. It would be nice if also with the Order Checkout only the items in the cart are being checked against the specific xxx_data.files. This demands adding some logic to the cart. Well,for me this is a learning Perl/CGI programming process. I am still working on this issue, so for the time being you have to do with this. This hack runs OK with me after extensive test running the shop Nevertheless, if you run into a problem, please notify me! Ellio ############################################################## In the web_store.setup.frames.javascript file (or other) add the following criteria array for multiple databases .................. $sc_user_carts_directory_path = "./User_carts"; $sc_data_file_path = "./Data_files/"; # Change to this ########################## Added Database selection criteria $sc_default_db_file= "data.file"; # default extension upon your choice # # Database criteria using abc_ type prefix to divide category alfabetically # @shop_datafiles_accepted = ( "$sc_data_file_path/abc_$sc_default_db_file", "$sc_data_file_path/def_$sc_default_db_file", "$sc_data_file_path/ghij_$sc_default_db_file", "$sc_data_file_path/klm_$sc_default_db_file", "$sc_data_file_path/nopqrs_$sc_default_db_file", "$sc_data_file_path/tuv_$sc_default_db_file", "$sc_data_file_path/wxyz_$sc_default_db_file", "$sc_data_file_path/$sc_default_db_file"); ########################### end addition $sc_options_directory_path = "./Html/Options/"; $sc_html_product_directory_path = "./Html/Products"; ...................... In your web_store_toc.html file for example set the searchcriteria like using &data_file=abc_ to reference specific data.file ...................... <TD><A HREF = "cdshop.cgi?category=aaaa&data_file=abc_&cart_id=" TARGET = "main"> <IMG SRC="html/buttons/blue_btn_a.jpg" WIDTH="22" HEIGHT="22" BORDER="0" ALT="[ A ]"></A></TD> <TD><A HREF = "cdshop.cgi?category=bbbb&data_file=abc_&cart_id=" TARGET = "main"> <IMG SRC="html/buttons/blue_btn_b.jpg" WIDTH="22" HEIGHT="22" BORDER="0" ALT="[ B ]"></A></TD> <TD><A HREF = "cdshop.cgi?category=cccc&data_file=abc_&cart_id=" TARGET = "main"> <IMG SRC="html/buttons/blue_btn_c.jpg" WIDTH="22" HEIGHT="22" BORDER="0" ALT="[ C ]"></A></TD> .......................... Next make an extra hidden field in the make_hidden_fields subroutine in web_store_html_lib.pl .................... $hidden = qq! <INPUT TYPE = "hidden" NAME = "cart_id" VALUE = "$cart_id"> <INPUT TYPE = "hidden" NAME = "page" VALUE = "$form_data{'page'}">!; if ($form_data{'data_file'} ne "") # Add this if statement. { $hidden .= qq! <INPUT TYPE = "hidden" NAME = "data_file" VALUE = "$form_data{'data_file'}">!; } # End addition if ($form_data{'keywords'} ne "") { $hidden .= qq! <INPUT TYPE = "hidden" NAME = "keywords" VALUE = "$form_data{'keywords'}">!; } ......................... Now move to the web_store_db_lib.pl file to the subroutine - submit_query In my following code the Next X-hits hack is applied ! Look for: ################## # CODECHANGES # ################# # Each row contains the fields in a PIPE delimited # form. # ############################################################ sub submit_query { # local(*database_rows) = @_; # replacement for NEXT X Hits local(*database_rows, $hits_seen) = @_; local($status); local(@fields); local($row_count); local(@not_found_criteria); local($line); # Read line from database # Added variable NOHITS $no_hits_found="yes"; # # exact_match and case_sensitive # are special form variables # which alter the behavior of # keyword searches (string data # type with the = operator). # # Normally keyword searches are # case insensitive and are not # exact match searches. # local($exact_match) = $form_data{'exact_match'}; local($case_sensitive) = $form_data{'case_sensitive'}; # We initialize row count to 0. # $row_count = 0; # Added NOHITS $no_hits_found = "yes"; # # The first thing we need to do is # open the data file and then check to # see if there was an error doing this. # ####### Previous code ##################### # open(DATAVIRTUAL, "$sc_data_file") || # # &file_open_error("$sc_data_file", # # "Read Database",__VIRTUAL__,__LINE__); # # # ######### New Code ####################### ##################################### local($sc_selected_data_file); # Added local variable # # Add if statement # if ($form_data{'data_file'} ne "") { # Check for prefix data file # # Build up file name # # If present, use $sc_selected_data_file # #################################### $sc_selected_data_file = "$sc_data_file_path$form_data{'data_file'}$sc_default_db_file"; open(DATAVIRTUAL, "$sc_selected_data_file") || &file_open_error("$sc_selected_data_file", "Read Database",__VIRTUAL__,__LINE__); # No change till }else{ condition from here # # If there was no error opening it, # then we read each line into $line # until the file ends or the row count # exceeds the maximum rows returned plus # 1. # ## while(($line = <DATAVIRTUAL> ) && ## ($row_count < $sc_db_max_rows_returned + 1)) # vervangen voor NEXT X HITS door while(($line = <DATAVIRTUAL> ) && ($row_count < $sc_db_max_rows_returned + $hits_seen)) { chop($line); # Chop off extraneous newline # Each field is split based on the pipe # delimiter. @fields = split(/\|/, $line); # First, we set not_found to zero # which indicates that we are assuming # the criteria was satisfied for the # row. # # Then, for each criteria # specified in @sc_db_query_criteria, # we call a routine to apply the # criteria. If the criteria is # not satisfied, it keeps returning # 1 which would increment $not_found. # # Thus, $not_found will end up being # the number of criteria that were # not found. 0 means success. # $not_found = 0; foreach $criteria (@sc_db_query_criteria) { $not_found += &flatfile_apply_criteria( $exact_match, $case_sensitive, *fields, $criteria); } # If not found is 0, and # the row count has not exceeded # the amount of rows that we # promised to return, # the row is pushed into the # @db_rows array. # ## if (($not_found == 0) && ## ($row_count <= $sc_db_max_rows_returned)) ## vervanging NEXT X Hits door $adjusted_line_number = $sc_db_max_rows_returned + $hits_seen; if (($not_found == 0) && ($row_count <= $adjusted_line_number) && $row_count >= $hits_seen) { push(@database_rows, join("\|", @fields)); } # # We always want to increment row count even # if we exceeded the maximum amount of rows # being returned. # # When not_found = 0, that means that the # criteria was satisfied for the row. if ($not_found == 0) { $row_count++; } } # End of while datafile has data # Finally, we close the datafile when # we are done with it. if ($row_count > 0) { $no_hits_found = "no"; } close (DATAVIRTUAL); ################################# } else { # Added }else{ of the if else statement # # # # Begin of added foreach # # This runs search on all files # ################################# foreach $data_file_acc (@shop_datafiles_accepted) { # Added $data_file_acc (@shop_datafiles_accepted) open(DATAVIRTUAL, "$data_file_acc") || # Open each data.file successively &file_open_error("$data_file_acc", "Read Database",__VIRTUAL__,__LINE__); # No changes until end of routine # # If there was no error opening it, # then we read each line into $line # until the file ends or the row count # exceeds the maximum rows returned plus # 1. # ## while(($line = <DATAVIRTUAL> ) && ## ($row_count < $sc_db_max_rows_returned + 1)) # replaced for NEXT X HITS while(($line = <DATAVIRTUAL> ) && ($row_count < $sc_db_max_rows_returned + $hits_seen)) { chop($line); # Chop off extraneous newline # Each field is split based on the pipe # delimiter. @fields = split(/\|/, $line); # First, we set not_found to zero # which indicates that we are assuming # the criteria was satisfied for the # row. # # Then, for each criteria # specified in @sc_db_query_criteria, # we call a routine to apply the # criteria. If the criteria is # not satisfied, it keeps returning # 1 which would increment $not_found. # # Thus, $not_found will end up being # the number of criteria that were # not found. 0 means success. # $not_found = 0; foreach $criteria (@sc_db_query_criteria) { $not_found += &flatfile_apply_criteria( $exact_match, $case_sensitive, *fields, $criteria); } # If not found is 0, and # the row count has not exceeded # the amount of rows that we # promised to return, # the row is pushed into the # @db_rows array. # ## if (($not_found == 0) && ## ($row_count <= $sc_db_max_rows_returned)) ## vervanging NEXT X Hits door $adjusted_line_number = $sc_db_max_rows_returned + $hits_seen; if (($not_found == 0) && ($row_count <= $adjusted_line_number) && $row_count >= $hits_seen) { push(@database_rows, join("\|", @fields)); } # # We always want to increment row count even # if we exceeded the maximum amount of rows # being returned. # # When not_found = 0, that means that the # criteria was satisfied for the row. if ($not_found == 0) { $row_count++; } } # End of while datafile has data # Finally, we close the datafile when # we are done with it. if ($row_count > 0) { $no_hits_found = "no"; } close (DATAVIRTUAL); ######################## } # add } End of For Each # ######################## } ############################################################ # add } End of added if statement to select single or multiple # # DATAVIRTUALS # ############################################################ # We passed database rows by reference so that # no extra copying of the array is needed when # we return the status. # if ($row_count > $sc_db_max_rows_returned) { $status = "max_rows_exceeded"; } # Finally, we return the status and # the row count. # return($status,$row_count); } # End of submit query