查看文章 |
Eric Enge: Let's move on to hidden text. There are a lot of legitimate ways people can use hidden text, and, there are of course ways they can illegitimately use hidden text. It strikes me that many of these kinds of hidden text are hard to tell apart. You can have someone who is using a simple CSS display:none scenario, and perhaps they are stuffing keywords, but maybe they do this with a certain amount of intelligence, making it much harder to detect then the site you recently wrote about. So, tell me about how you deal with these various forms of hidden text? Matt Cutts: Sure. I don't know if you saw the blog post recently where somebody tried to layout many different ways to do hidden text, and ended up coming up with 14 different techniques. It was kind of a fun blog post, and I forwarded it to somebody and said "hey, how many do we check for"? There were at least a couple that wasn't strictly hidden text, but it was still an interesting post. Certainly there are some cases where people do deceptive or abusive things with hidden text, and, those are the things that get our users most angry. If your web counter displays a single number, that's just a number, a single number. Probably, users aren't going to complain about that to Google, but if you have 4,000 stuffed words down at the bottom of the page that's clearly the sort of thing that if user realizes it's down at the bottom of the page, they get angry about it. Interestingly enough they get angry about it whether it helped or not. I saw somebody do a blog post recently that had a complaint about six words of hidden text and how they showed up for the query "access panels". In fact, the hidden text didn't even include the word access panels, just a variant of that phrase. Eric Enge: I am familiar with the post. Matt Cutts: I thought it was funny that this person had gotten really offended at six words of hidden text, and complained about a query which had only one word out of the two. So, you do see a wide spectrum, where people really dislike a ton of hidden keyword stuff. They might not mind a number, but even with as little as six words, we do see complaints about that. So, our philosophy has tried to be not to find any false positives, but to try to detect stuff that would qualify as keyword stuffing, or gibberish, or stitching pages, or scraping, especially put together with hidden text. We use a combination of algorithmic and manual things to find hidden text. I think Google is alone in notifying Webmasters about relatively small incidences of hidden text, because that is something where we'll try to drop an email to the Webmaster, and alert them in Webmaster Central. Typically, you'd get a relatively short-term penalty from Google, maybe 30 days for something like that. But, that can certainly go up over time, if you continue to leave the text on your page. Eric Enge: Right. So, a 30 days penalty in this sort of situation, is that getting removed from the index, or just de-prioritizing their rankings? Matt Cutts: Typically with hidden text, a regular person can look at it and instantly tell that it is hidden text. There are certainly great cases you could conjure up where that is not the case, but the vast majority of the time it's relatively obvious. So, for that it would typically be a removal for 30 days. Then, if the site removes the hidden text or does a reconsideration request directly after that it could be shorter. But, if they continue to leave up that hidden text then that penalty could get longer. We have to balance what we think is best for our users. We don't want to remove resources from our index longer than we need to it, especially if it's relatively high quality. But, at the same time we do want to have a clean index and protect the relevance of it. Eric Enge: Right. Note that Accesspanels.net has removed the hidden text and they are still ranked no. 1 in Google for the query "access panels". I checked this a few days ago, and the hidden text had been removed. The site has a "last updated" indicator at the bottom of the page, and it was just the day before I checked. Matt Cutts: That's, we probably shouldn't get into too much detail about individual examples, but that one got our attention and is winding its way through the system. Eric Enge: Right. When reporting web spam, writing a blog post in a very popular blog and getting a lot of peoples' attention to it is fairly effective. But, also Webmaster tools allows you to do submissions there, and it gets looked at pretty quickly, doesn't it? Matt Cutts: It does. We try to be pretty careful about the submissions that we get to our spam report form. We've always been clear that the first and primary goal with those is to look at those to try to figure out how to improve our algorithmic quality. But, it is definitely the case that we look at many of those manually as well, and so you can imagine if you had a complaint about a popular site because it had hidden text, we would certainly check that out. For example, the incident we discussed just a minute ago, someone had checked it out earlier today and noticed the hidden text is already gone. We probably won't bother to put a postmortem penalty on that one, but, it's definitely the case that we try to keep an open mind and look at spam reports, and reports across the web not just on big blogs, but also on small blogs. We try to be pretty responsive and adapt relatively well. That particular incident was interesting, but I don't think that the text involved was actually affecting that query since it was different words. Eric Enge: Right. Are there hidden text scenarios that are harder for you to discern whether or not they are spam versus something like showing just part of a site's terms and conditions on display, or dynamic menu structures? Are there any scenarios where it's really hard for you tell whether it is spam or not? Matt Cutts: I think Google handles the vast majority of idioms like dynamic menus and things like that very well. In almost all of these cases you can construct interesting examples of hidden text. Hidden text, like many techniques, is on a spectrum. The vast majority of the time, you can look and you can instantly tell that it is malicious, or it's a huge amount of text, or it's not designed for the user. Typically we focus our efforts on the most important things that we consider to be a high priority. The keyword stuffed pages with a lot of hidden text, we definitely give more attention. Eric Enge: Right. Matt Cutts: So, we do view a lot of different spam or quality techniques as being on a spectrum. And, the best advice that I can give to your readers is probably to ask a friend to look at their site, it's easy to do a Ctrl+A, it's easy to check things with cascading style sheets off, and stick to the more common idioms, the best practices that lots of sites do rather than trying to do an extremely weird thing that could be misinterpreted even by a regular person as being spamming. Eric Enge: Fair enough. There was a scenario that we reported a long while ago involving a site that was buying links, and none of those links were labeled. There was a very broad pattern of it, but the one thing that we noticed and thought was a potential signal was that the links were segregated from the content, they were either in the right rail or the left rail, and the main content for the pages were in the middle. The links weren't integrated in the site, there was no labeling of them, but they were relevant. That's an example of a subtle signal, so, it must be challenging to think about how much to make out of that kind of a signal. Matt Cutts: We spend everyday, all day, pretty much steeped in looking at high quality content and low quality content. I think our engineers and different people who are interested in web spam are relatively attuned to the things that are pretty natural and pretty organic. I think it's funny how you'll see a few people talking about how to fake being natural or fake being organic. It's really not that hard to really be natural, and to really be organic, and sometimes the amount of creativity you put into trying to look natural could be much better used just by developing a great resource, or a great guide, or a great hook that gets people interested. That will attract completely natural links by itself, and those are always going to be much higher quality links, because they are really editorially chosen. Someone is really linking to you, because they think you've got a great site or really good content. Eric Enge: I think you have a little bit of the Las Vegas gambling syndrome too. When someone discovers that they have something that appear to have worked, they want to do more, and then they want to do more, and then they want to do more, and its kind of hard to stop. Certainly you don't know where the line is, and there is only one way to find the line, which is to go over it. Matt Cutts: Hopefully the guidelines that we give on the Webmaster guidelines are relatively common sense. I thought it was kind of funny that we responded to community feedback and recommended that people avoid excessive reciprocal links, of course, some of these do happen naturally, but, people started to worry and wonder about what the definition of excessive was. I thought it was kind of funny, because within one response or two responses people were like saying "if you are using a ton of automated scripts to send out spam emails that strikes me as excessive. People pretty reasonably and pretty quickly came to a fairly good definition of what is excessive and that's the sort of thing where we try to give general guidance so that people can use their own common sense. Sometime people help other people to know where roughly those lines are, so they don't have to worry about getting to close to them. Eric Enge: The last question is a link related question. You can get a thunderstorm of links in different ways. You get on the front page of Digg, or you can be written up in the New York Times, and suddenly a whole bunch of links pour into your site. There are patterns help by Google that talk about temporal analysis, for example, if you are acquiring links at a certain rate, and suddenly it changes to a very high rate. That could be a spam signal, right. Correspondently, if you are growing at a high rate, and then that rate drops off significantly, that could be a poor quality signal. So, if you are a site owner and one of these things happens to you, do you need to be concerned about how that will be interpreted? Matt Cutts: I would tell the average site owner not to worry, because in the same way that you spend all day thinking about links, and pages, and what's natural and what's not, it's very common for a few things to get on the front page of Digg. It happens dozens of times a day; and so getting there might be a really unique thing for your website, but it happens around the web all the time. And so, a good search engine to needs to be able to distinguish the different types of linking patterns, not just by real content, but breaking news and things like that. I think we do a pretty good job of distinguishing between real links and links that are maybe a little more artificial, and we are going to continue to work on improving that. We'll just keep working to get even smarter about how we process links we see going forward in time. Eric Enge: You might have a very aggressive guy that is going out there and he knows how to work to Digg system, and he is getting on the front page of Digg every week or so. They would end up with a very large link growth over a short period of time, and that's what some of the pundits would advise you to do. Matt Cutts: That's an interesting case. I think at least with that situation you've still got to have something that's compelling enough that it gets people interested in it somehow. Eric Enge: You have appealed to some audience. Matt Cutts: Yeah. Whether it's a Digg technology crowd or a Techmeme crowd, or Reddit crowd, I think different things appeal to different demographics. It was interesting that at Search Engine Strategies in San Jose you saw Greg Boser pick on the viral link builder approach. But, I think one distinguishing factor is that with a viral link campaign, you still have to go viral. You can't guarantee that something will go viral, so the links you get with these campaigns have some editorial component to them, which is what we are looking for. Eric Enge: People have to respond at some level or you are not going to get anywhere. Matt Cutts: Right. I think it's interesting that it's relatively rare these days for people to do a query and find completely off topic spam. It absolutely can still happen, and if you go looking for it you can find off topic spam, but it's not a problem that most people have on a day-to-day basis these days. Over time, in Web spam, we start to think more about bias and skew and how to rank things that are on topic appropriately. Just like we used to think about how we return the most relevant pages instead of off topic stuff. The fun thing about working at Google is, the challenges are always changing and you come in to work and there are always new and interesting situations to play with. So, I think we will keep working on trying to improve search quality and improve how we handle links, and handle reputation, and in turn we are trying to work with Webmasters who want to return the best content and try to make great sites so they can be successful Eric Enge: Thank you very much. Matt Cutts: Always a pleasure to talk to you Eric. |