Results 1 to 16 of 16
  1. #1
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Is there a Script for filtering datafeed entries by Keyword?
    Is there a Script for filtering a datafeed entry by Keyword? I think this is harder than my previous datafeed question about processing a SAS datafeed file and getting it on my server.

    Lets say:
    1. I already have the CSV file in .txt with my SAS/performics ID on the server.
    2. I have a clothing merchant and I only want to show Women's clothing from their feed on my women's clothing site. I don't want to show girl's, boy's, or men's clothing.
    3. Is there an existing script that will get rid of complete entries based on keywords for example "men" and then re-save it using a name I choose?

  2. #2
    ABW Ambassador PatrickAllmond's Avatar
    Join Date
    September 20th, 2005
    Location
    OKC
    Posts
    1,219
    Is this in a database yet ?
    ---
    This response was masterly crafted via the fingers of Patrick Allmond who believe you should StopDoingNothing starting today.
    ---
    Focus Consulting is where I roll | Follow @patrickallmond on Twitter
    Search Engine Marketing | Search Engine Optimization | Social Media | Online Video

  3. #3
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Quote Originally Posted by patrick24601
    Is this in a database yet ?
    This is pre-database. It would also help with those feeds that are too big when all you really need is a section of it.


    There is a file say "merchantname.txt" already on my server ready to import to my database... I just want to make it smaller and more relevant first. Maybe I'd want in this case to call the processed file "merchantnamewomen.txt" since I would have filtered out entries with keywords like men, girls, and boys. Then I'd import the merchantnamewomen.txt into my database with another script that I like a lot.

  4. #4
    ABW Ambassador PatrickAllmond's Avatar
    Join Date
    September 20th, 2005
    Location
    OKC
    Posts
    1,219
    Well for the programmer in me it is actually a helluva lot easier to import it all and delete what I don't want. That would take me longer than...

    What you *might* be able to do is open it with something like notepad++, search for certain lines, keep them and delete the rest. Man - not really sure. You could probably do the same thing in excel. Sort and filter by a certain column.

    In SQL it is a one line command:
    select * from table where desc like '%women%' or desc like '%boys%' or desc like '%men%'.

    FYI... I have yet to see a feed that is too big. IMO - 50K-60K products? No big whoop.
    ---
    This response was masterly crafted via the fingers of Patrick Allmond who believe you should StopDoingNothing starting today.
    ---
    Focus Consulting is where I roll | Follow @patrickallmond on Twitter
    Search Engine Marketing | Search Engine Optimization | Social Media | Online Video

  5. #5
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Quote Originally Posted by patrick24601

    What you *might* be able to do is open it with something like notepad++, search for certain lines, keep them and delete the rest. Man - not really sure. You could probably do the same thing in excel. Sort and filter by a certain column.
    I know I could in excel or another editor manually search and delete and save, then re-upload the datafeed, but that would take hours of work and considering how I might want to use the same datafeed for different sites (and therefore different niches) that would get a little crazy.

  6. #6
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    I found a fantastic freeware utility that just does that ... but you should be comfortable with old school command line stuff (ie. no GUI). You run it on your own PC, against a local file.

    It allows you to extract lines from a text file that meet your criteria... and you don't have to learn complicated awk stuff (shudder). There are a number of expensive & very advanced text-conversion apps out there if you have higher needs (TextPipe Pro is amazing), but this fits the bill for me perfectly.

    I have no affiliation with this site (don't know why I say that every time I post a helpful link!):

    http://www.stahlforce.com/dev/index.php?tool=filter

  7. #7
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    I visited that site and I have to admit that it is way over my head.
    Anyway, I need something that I can run on the server since my internet connection is very slow. I have lost plenty of time downloading and uploading feeds.
    Now, that I have the script that gets the SAS feeds and inserts the aff. code. I don't want to ruin that advantage by having to download that and later upload it.



    Quote Originally Posted by teezone
    I found a fantastic freeware utility that just does that ... but you should be comfortable with old school command line stuff (ie. no GUI). You run it on your own PC, against a local file.

    It allows you to extract lines from a text file that meet your criteria... and you don't have to learn complicated awk stuff (shudder). There are a number of expensive & very advanced text-conversion apps out there if you have higher needs (TextPipe Pro is amazing), but this fits the bill for me perfectly.

    I have no affiliation with this site (don't know why I say that every time I post a helpful link!):

    http://www.stahlforce.com/dev/index.php?tool=filter

  8. #8
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    It looks more complicated than it is.. and server side is going to be more challenging as that will require access to run scripts, whether by automated cron, or manually using SSH.

    Here is an example of how I use the app I listed (from a DOS command line):

    sfk filter afffile.txt -+"TV" -+"CD" -+"DVD" >tv-cd-dvd.txt

    This extracts each line that contains *either* TV, CD or DVD (I've simplified this, you would insert the actual merchant category name). And the output is saved in a local file "tv-cd-dvd.txt"

    Text-conversion stuff is easy for programmers (I'm not one, by the way, it's not easy for me!), so you may want to throw the project on Rent-A-Coder - you will ensure it gets done properly, without having to spend too much for the job.

  9. #9
    Full Member
    Join Date
    January 18th, 2005
    Posts
    396
    Might want to take another look at that 'Swiss Army Knife' program - it does come in a Linux flavor and a Windows one so you might be able to run it on your server and while it's instructions are substantial, what you want seems fairly simple ...

    sfk -!men -!boy mytestfile -write ===> mytextfile now contains all lines except those with 'men' or 'boy'

    or so my guess is - I haven't run it just looked at it

  10. #10
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    Quote Originally Posted by lostdeviant
    Is there a Script for filtering a datafeed entry by Keyword? I think this is harder than my previous datafeed question about processing a SAS datafeed file and getting it on my server.

    Lets say:
    1. I already have the CSV file in .txt with my SAS/performics ID on the server.
    2. I have a clothing merchant and I only want to show Women's clothing from their feed on my women's clothing site. I don't want to show girl's, boy's, or men's clothing.
    3. Is there an existing script that will get rid of complete entries based on keywords for example "men" and then re-save it using a name I choose?
    lostdeviant,

    I don't know if this is what you need, I see you say "filtering a datafeed entry by Keyword", if what you mean by that is filtering by categories, then this will work for you.

    I just tried on my site before and I can get just the category I want from the datafeed, without even touching the datafeed that is already upload on the server.

    I used the SAS merchant # 4313 for this test, they have 23 different categories before I tweaked the Select statement from my category menu, and when I do this:

    Before:

    $result = mysql_query("select DISTINCT replace(merchantCategory,' ','-') merchantCategory from $storetable ORDER by merchantCategory ASC") or die (mysql_error());

    23 categories will show up.
    ---------------------------------

    After:

    $result = mysql_query("select DISTINCT replace(merchantCategory,' ','-') merchantCategory from $storetable WHERE merchantCategory = 'Hot Deals' ORDER by merchantCategory ASC") or die (mysql_error());

    Just the Hot Deals category will show up.

    --------------

    This way too, After:

    $result = mysql_query("select DISTINCT replace(merchantCategory,' ','-') merchantCategory from $storetable WHERE merchantCategory LIKE '%Hot Deals%' ORDER by merchantCategory ASC") or die (mysql_error());

    Just the Hot Deals category will show up with this one too.

    I hope that what's you need.

  11. #11
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Mr Sal.,

    I'd really like to basically do negative keywords, but a category exclusion might work. I don't want just one category at least not normally. I'd like to filter out the stuff that isn't related.

    How would I get that code you mentioned in a PHP script... it looks like you are manipulating the database. I want to edit the feed file its self that I already have on my hosted server before adding it to the database.

  12. #12
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Just modify the script you used for the SAS feed download. I took Donk's snippet and added a line that would take care of this:

    Code:
    $fp= fopen($local_file,"r");
    $fp1 = fopen("tempfile", "w");
    // read one line at a time
    while ($data=fgets($fp))
    {
        $replace = str_replace("YOURUSERID",$sasid,$data);
        if(preg_match("/Women\'s Clothing/", $data) == 1)
        {
            fwrite($fp1,$replace);
        }
    }
    fclose($fp);
    fclose($fp1);
    // get rid of the tempfile
    copy ("tempfile",$local_file);
    unlink("tempfile");
    It'll only write the lines that contain "Women's Clothing" to your resulting file. Just be sure to backslash any single quotes in the preg_replace criteria like I did with Women's. This isn't the ideal solution because you're checking the entire row rather than just the category field, but it should get the job done.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  13. #13
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Thank you Snib,

    That modification will work when I want to be very restrictive. I'd prefer someway to use negative keywords...

    if in this example keywords men, boy, or girl are in the line it doesn't get included. Perhaps set the negative keywords as a variable in the beginning of the script? would something like $negkeywords = "men,girl,boy"; or something of the sort work along with the If line?

    I guess I'm afraid that if I only match lines with "women's clothing" I'd miss out on many women's products that just don't use that word.

  14. #14
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by lostdeviant
    Thank you Snib,

    That modification will work when I want to be very restrictive. I'd prefer someway to use negative keywords...

    if in this example keywords men, boy, or girl are in the line it doesn't get included. Perhaps set the negative keywords as a variable in the beginning of the script? would something like $negkeywords = "men,girl,boy"; or something of the sort work along with the If line?

    I guess I'm afraid that if I only match lines with "women's clothing" I'd miss out on many women's products that just don't use that word.
    I would just pick the parent categories that you want and do an if-elseif-elseif series of preg_matches to include only those categories. But if you want to do a negative match you could use the same code and make each if statement result in a continue; which will skip to the next row. So for example:

    Code:
    if(preg_match("/boy/", $data) == 1) { continue; }
    else if(preg_match("/girl/", $data) == 1) { continue; }
    else 
    {
        write to file..
    }
    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  15. #15
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Thank you Snib/Scott,

    That's exactly what I needed! Additionally, by seeing the code samples, I am starting to understand what the pieces do not just the sections.

    I hope others also find this thread useful.
    Last edited by lostdeviant; March 11th, 2008 at 10:41 PM. Reason: Scott is super cool!

  16. #16
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Quote Originally Posted by Snib
    I would just pick the parent categories that you want and do an if-elseif-elseif series of preg_matches to include only those categories. But if you want to do a negative match you could use the same code and make each if statement result in a continue; which will skip to the next row. So for example:

    Code:
    if(preg_match("/boy/", $data) == 1) { continue; }
    else if(preg_match("/girl/", $data) == 1) { continue; }
    else 
    {
        write to file..
    }
    - Scott
    Hi, I got it removing quite a few entries with the code. I have noticed that if the word is not by its self, the line isn't removed. Is there a more general match than "preg_match" so that if there is a comma or pipe symbol at the end or beginning of a word, or an s at the end of the word, it would still be caught?


    // read one line at a time
    while ($data=fgets($fp))
    {
    $replace = str_replace("YOURUSERID",$sasid,$data);
    if(preg_match("/pet/", $data) == 1) { continue; }
    else if(preg_match("/cat/", $data) == 1) { continue; }
    else if(preg_match("/kitten/", $data) == 1) { continue; }
    else
    {
    fwrite($fp1,$replace);
    }
    }
    fclose($fp);
    fclose($fp1);
    // get rid of the tempfile
    copy ("tempfile",$local_file);
    unlink("tempfile");
    In the above example, I found that cats, kittens, (pipe symbol)cat, kitten(comma), etc lines are not ignored. I am guessing that preg_match is for the word alone.

    Perhaps there is a delimiter after the / to check word sections?

  17. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Featured: Datafeed to SQL script? Not price comparison script
    By oxplode in forum Programming / Datafeeds / Tools
    Replies: 19
    Last Post: June 26th, 2014, 02:27 PM
  2. Need Programmer for FTP script, datafeed import script
    By markwelch in forum Programming / Datafeeds / Tools
    Replies: 10
    Last Post: April 20th, 2007, 02:16 PM
  3. Need Programmer for FTP script, datafeed import script
    By markwelch in forum Programming / Datafeeds / Tools
    Replies: 0
    Last Post: March 5th, 2007, 04:24 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •